Nucleic acid binding of multi-zinc finger transcription factors

ABSTRACT

A method of identifying transcription factors comprising providing cells with a nucleic acid sequence at least comprising a sequence CACCT (SEQ ID NO: ______ ) as bait for the screening of a library encoding potential transcription factors and performing a specificity test to isolate said factors. Preferably, the bait comprises twice the CACCT (SEQ ID NO: ______ ) sequence, more particularly the bait comprises one of the sequences CACCT-N-CACCT(SEQ ID NO: ______ ), CACCT-N-AGGTG(SEQ ID NO: ______ ), AGGTG-N-CACCT(SEQ ID NO: ______ ), or AGGTG-N-AGGTG (SEQ ID NO: ______ ) wherein N is a spacer sequence. The transcription factors identified using the methods of the invention include separated clusters of zinc fingers, such as, for example, a two-handed zinc finger transcription factor. Also, at least one such zinc finger transcription factor, denominated as SIP1, induces tumor metastasis by down regulation of the expression of E-cadherin. Compounds interfering with SIP1 activity can thus be used to prevent tumor invasion and metastasis.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of International Appln.PCT/EP00/05582 (International Publ. No. WO 01/00864, published Jan. 4,2001, the contents of the entirety of which is incorporated by thisreference, filed on Jun. 9, 2000, designating the United States ofAmerica.

TECHNICAL FIELD

[0002] The invention relates to biotechnology generally, and morespecifically to a method of identifying transcription factors.

BACKGROUND

[0003] Zinc fingers are among the most common DNA binding motifs foundin eukaryotes. It is estimated that there are 500 zinc finger proteinsencoded by the yeast genome and that perhaps 1% of all mammalian genesencode zinc finger containing proteins. These proteins are classifiedaccording to the number and position of the cysteine and histidineresidues available for zinc coordination.

[0004] The CCHH class, typified by the Xenopus transcription factor IIIA(19), is the largest. These proteins contain two or more fingers intandem repeats. In contrast, the steroid receptors contain only cysteineresidues that form two types of zinc-coordinated structures with four(C₄) and five (C₅) cysteines (28). Another class of zinc fingerscontains the CCHC fingers. The CCHC fingers, which are found inDrosophila, and in mammalian and retroviral proteins, display theconsensus sequence C-X₂-C-X₄-H-X₄-C (Refs. 7, 21, 24). Recently, a novelconfiguration of CCHC finger, of the C-X₅-C-X₁₂-H-X₄-C type, was foundin the neural zinc finger factor/myelin transcription factor family(Refs. 11, 12, 36). Finally, several yeast transcription factors such asGAL4 and CHA4 contain an atypical C₆ zinc finger structure thatcoordinates 2 zinc ions (Refs. 9, 32).

[0005] Zinc fingers are usually found in multiple copies (up to 37) perprotein. These copies can be organized in a tandem array, forming asingle cluster or multiple clusters, or they can be dispersed throughoutthe protein. Several families of transcription factors share the sameoverall structure by having two (or three) widely separated clusters ofzinc fingers in their protein sequence. The first, the MBPs/PRDII-BF1transcription factor family, includes Drosophila Schnurri and Spaltgenes (1, 3, 6, 14, 33). Both MBP-1 (also known as PRDII-BF1) and MBP-2contain two widely separated clusters of two CCHH zinc fingers. Theoverall similarity between MBP-1 and MBP-2 is 51%, but the conservationis much higher (over 90%) for both the N-terminal and the C-terminalzinc finger clusters (33). This indicates an important role of bothclusters in the function of these proteins. In addition, the N-terminaland C-terminal zinc finger clusters of MBP-1 are very homologous to eachother (3).

[0006] The neural specific zinc finger factor 1 and factor 3 (NZF-1 andNZF-3), as well as the myelin transcription factor 1 (MyT1, also knownas NZF-2), belong to another family of proteins containing two widelyseparated clusters of CCHC zinc fingers (11, 12, 36). Like the MBPproteins, different NZF factors exhibit a high degree of sequenceidentity (over 80%) between the respective zinc finger clusters, whereasthe sequences outside of the zinc finger region are largely divergent(36). In addition, each of these clusters can independently bind to DNA,and recognizes similar core consensus sequences (11). NZF-3 binds to aDNA element containing a single copy of this consensus sequence but wasshown to exhibit a marked enhancement in relative affinity to abipartite element containing two copies of this sequence (36). Thisfinding suggests that the NZF factors may also bind to reiteratedsequences. However, the mechanism underlying the cooperative binding ofNZF-3 to the bipartite element is currently unknown.

[0007] The Drosophila Zfh-1 and the vertebrate δEF1 proteins (also knownas ZEB or AREB6) belong to a third family of transcription factors. Thisfamily is characterized by the presence of two separated clusters ofCCHH zinc fingers and a homeodomain-like structure (see, FIG. 1A)(Refs.4, 5, 35). In δEF1, the N-terminal and C-terminal clusters are also veryhomologous and were shown to bind independently to very similar coreconsensus sequences (10). Recently, it was shown that mutant forms ofδEF1 lacking either the N-terminal or the C-terminal cluster have losttheir DNA binding capacity indicating that both clusters are requiredfor the binding of δEF1 to DNA (31). The Evi-1 transcription factor wasshown to contain 10 CCHH zinc fingers; seven zinc fingers are present inthe N-terminal region, and three zinc fingers are in the C-terminalregion (22). With this factor the situation is different from thetranscription factors described above, because the two clusters bind totwo different target sequences, which are bound simultaneously byfull-length Evi-1 (20). Binding of full-length Evi-1 is mainly observedwhen the two target sequences are positioned in a certain relativeorientation, but there was no strict requirement for an optimal spacingbetween these two targets.

[0008] Cell-cell adhesion is predominantly a necessity during celldifferentiation, tissue development, and tissue homeostasis. The effectof disrupted cell-cell adhesion is displayed in many cancers, wheremetastasis and poor prognosis are correlated with loss of cell-celladhesion. E-cadherin, a homophilic Ca²⁺-dependent transmembrane adhesionmolecule, and the associated catenins are among the major constituentsof the epithelial cell-junction system. E-cadherin exerts a potentinvasion-suppressing role in tumor cell line systems (Refs. 46, 47) andin in vivo tumor model systems (Ref. 48). Loss of E-cadherin expressionduring tumor progression has been described for more than 15 differentcarcinoma types (49). Extensive analyses has made clear that aberrantE-cadherin expression as a result of somatic inactivating mutations ofboth E-cadherin alleles is rare and so far largely confined to diffusegastric carcinomas and infiltrative lobular breast carcinomas (50, 51).Northern analysis and in situ hybridization studies revealed thatreduced E-cadherin immunoreactivity in human carcinomas correlates withdecreased mRNA levels (52-54). Analysis of mouse and human E-cadherinpromoter sequences revealed a conserved modular structure with positiveregulatory elements including a CCAAT-box and a GC-box, as well as twoE-boxes (CANNTG) with a potential repressor role (Refs. 55, 56).Mutation analysis of the two E-boxes in the E-cadherin promoterdemonstrated a crucial role in the regulation of the epithelial specificexpression of E-cadherin. Mutation of these two E-box elements resultsin the up regulation of the E-cadherin promoter in dedifferentiatedcancer cells, where the wild type promoter shows low activity (55, 56).cl SUMMARY OF THE INVENTION

[0009] The invention relates to a method of identifying transcriptionfactors involving providing cells with a nucleic acid sequence includinga sequence CACCT (the first 5 nucleotides of SEQ ID NO: 1) as bait forthe screening of a library encoding potential transcription factors andperforming a specificity test to isolate the factors. Transcriptionfactors identified using the method include separated clusters of zincfingers such as, for example, a two-handed zinc finger transcriptionfactor. At least one such zinc finger transcription factor, denominated“SIP1”, induces tumor metastasis by down regulation of the expression ofE-cadherin. Compounds interfering with SIP1 activity can thus be used toprevent tumor invasion and metastasis.

[0010] The mechanism of DNA binding remains poorly understood for mostof the previously identified complex factors. We have characterized theDNA binding properties of vertebrate transcription factors belonging tothe emerging family of two-handed zinc finger transcription factors suchas δEF1 and SIP1. SIP1 is a member of this transcription factor family,which was recently isolated and characterized as a Smad-interactingprotein (Ref. 34). The SIP1 and δEF1, a transcriptional repressorinvolved in skeletal development and muscle cell differentiation, belongto the same family of transcription factors. They contain two separatedclusters of CCHH zinc fingers, which share high sequence identity(>90%). The DNA-binding properties of these transcription factors havebeen investigated. The N-terminal and C-terminal clusters of SIP1 showhigh sequence homology as well, and according to the invention eachbinds to a 5′-CACCT sequence(the first 5 nucleotides of SEQ ID NO: 1).Furthermore, high affinity binding sites for full length SIP1 and δEF1in the promoter regions of candidate target genes like Brachyury,α4-integrin and E-cadherin, are bipartite elements composed of one CACCTsequence (the first 5 nucleotides of SEQ ID NO: 1) and one CACCTGsequence. No strict requirement for the relative orientation of bothsequences was observed, and the spacing between them (also denominatedas N) may vary from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . , to atleast 44 bp. For binding to these bipartite elements, the integrity ofboth SIP1 zinc finger clusters is necessary, indicating that they areboth involved in binding to DNA. Furthermore, SIP1 binds as a monomer toa CACCT-X_(N)-CACCTG site (SEQ ID NO: 1), by having one zinc fingercluster contacting the CACCT (the first 5 nucleotides of SEQ ID NO: 1),and the other zinc finger cluster binding to the CACCTG sequence.

[0011] This binding may be generalized to other transcription factorsthat contain separated clusters of zinc fingers and may be applied toother Smad-binding proteins. Moreover, the Smad-interacting protein SIP1shows high expression in E-cadherin-negative human carcinoma cell lines,resulting in down regulation of E-cadherin transcription. Conditionalexpression of SIP1 in E-cadherin-positive MDCK cells also abrogatesE-cadherin-mediated intercellular adhesion and simultaneously inducedinvasion. Hence, SIP1 can considered as a potent invasion promotermolecule and compounds, such as anti-SIP1 antibodies, small moleculesspecifically binding to SIP, anti-sense nucleic acids and ribozymes,which interfere with SIP1 production or activity can prevent tumorinvasion and metastasis.

[0012] The invention thus includes a method of identifying transcriptionfactors such as activators and/or repressors. The method comprisesproviding cells with a nucleic acid sequence at least comprising asequence CACCT (the first 5 nucleotides of SEQ ID NO: 1) or AGGTG (thefirst 5 nucleotides of SEQ ID NO: 3) (preferably, twice the CACCT (thefirst 5 nucleotides of SEQ ID NO: 1) sequence) as bait for the screeningof a library encoding potential transcription factors and performing aspecificity test to isolate the factors.

[0013] In another embodiment, the bait comprises one of the sequencesCACCT-N-CACCT (SEQ ID NO: 1), CACCT-N-AGGTG (SEQ ID NO: 2),AGGTG-N-CACCT (SEQ ID NO: 3) or AGGTG-N-AGGTG (SEQ ID NO: 4) wherein Nis a spacer sequence. The latter spacer sequence can vary in length andcan contain any number of base pairs (“bp”) from N=0 bp to N= at least44 bp. Thus, for example, N can be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300 or 400 bp inlength.

[0014] The transcription factor(s) identified using a method accordingto the invention comprises separated clusters of zinc fingers such as,for example, two-handed zinc finger transcription factors.

[0015] These sequences may originate from any promoter region, butpreferably from the group (also referred to as “target genes”) selectedfrom Brachyury, α4-integrin, follistatin or E-cadherin.

[0016] The invention includes the transcription factors obtainable byand produced by a method according to the invention.

[0017] In another embodiment, the invention relates to a method ofidentifying, isolating, and/or producing compounds with an interferencecapability towards transcription factors, obtained as described herein.For example, the invention includes a method involving adding a samplecomprising a potential compound to be identified to a test systemcomprising (i) a nucleotide sequence comprising one of SEQ ID NO: 1, SEQID NO: 2, SEQ ID NO: 3, or SEQ ID NO: 4 wherein N, in these sequences isa spacer sequence as previously described, (ii) a protein capable tobind the nucleotide sequence, incubating the sample in the system for aperiod sufficient to permit interaction of the compound or itsderivative or counterpart thereof with the protein and comparing theamount and/or activity of the protein bound to the nucleotide sequencebefore and after the addition.

[0018] Comparison of the amount of protein bound to the nucleotidesequence before and after adding the test sample can be accomplished,for example, by using a gel band-shift assay or a filter-binding assay.As a next step the compound thus identified can be isolated andoptionally purified and further analyzed according to methods known topersons skilled in the art. The protein in step a) (ii) can be anyprotein capable to bind the nucleotide sequence, but is preferably aSmad-interacting protein such as SIP1.

[0019] Compounds identified by the latter method are also part of thepresent invention. With the term ‘compounds with an interferencecapability towards transcription factors’ is meant compounds, which areable to modulate (e.g., to inhibit, to weaken, and/or to strengthen) thebioactivity of transcription factors. More specifically, the lattercompounds are able to completely or partially inhibit the productionand/or bioactivity of SIP1. Examples of such compounds are smallmolecules or anti-SIP1 antibodies or functional fragments derivedthereof specifically binding to SIP1 protein or anti-sense nucleic acidsor ribozymes binding to mRNA encoding SIP1 or small molecules bindingthe promoter region bound by SIP1. In this regard, the present inventionrelates to compounds that modulate regulation of E-cadherin expressionby SIP1. More specifically, the present invention relates to compoundsthat, via inhibiting SIP1 production and/or activity prevent thedown-regulation of the expression of the target gene E-cadherin. Inother words, the present invention relates to compounds that can be usedas a medicament to prevent or treat tumor invasion and/or metastasis,which is due to the down-regulation of E-cadherin expression by SIP-1.Methods to produce and use the latter compounds are exemplified further.

[0020] The invention also includes a test kit to perform the methodcomprising at least (i) an nucleotide sequence comprising one of SEQ IDNO: 1, SEQ ID NO: 2, SEQ ID NO: 3, or SEQ ID NO: 4 wherein N, in thesesequences, is a spacer sequence as previously described, and (ii) aprotein capable of binding the nucleotide sequence.

[0021] In another embodiment, the invention concerns an alternative tothe so-called “two hybrid” screening assay as disclosed in the priorart. Several means and methods have been developed to identify bindingpartners of proteins. This has resulted in the identification of anumber of respective binding proteins. Many of these proteins have beenfound using so-called “two hybrid” systems. Two-hybrid cloning systemshave been developed in several labs (Chien et al., 1991; Durfee et al.,1993; Gyuris et al., 1993). All have three basic components: Yeastvectors for expression of a known protein fused to a DNA-binding domain,yeast vectors that direct expression of cDNA-encoded proteins fused to atranscription activation domain, and yeast reporter genes that containbinding sites for the DNA-binding domain. These components differ indetail from one system to the other. All systems utilize the DNA bindingdomain from either Gal4 or LexA. The Gal4 domain is efficientlylocalized to the yeast nucleus where it binds with high affinity towell-defined binding sites that can be placed upstream of reporter genes(Silver et al., 1986). LexA does not have a nuclear localization signal,but enters the yeast nucleus and, when expressed at a sufficient level,efficiently occupies LexA binding sites (operators) placed upstream of areporter gene (Brent et al., 1985). No endogenous yeast proteins bind tothe LexA operators. Different systems also utilize different reporters.Most systems use a reporter that has a yeast promoter, either from theGAL1 gene or the CYC1 gene, fused to lacZ (Yocum et al., 1984). TheselacZ fusions either reside on multicopy yeast plasmids or are integratedinto a yeast chromosome. To make the lacZ fusions into appropriatereporters, the GAL1 or CYC1 transcription regulatory regions have beenremoved and replaced with binding sites that are recognized by theDNA-binding domain being used. A screen for activation of the lacZreporters is performed by plating yeast on indicator plates that containX-Gal (5-bromo-4-chloro-3-indolyl-β-D-galactoside); on this medium,yeast (in which the reporters are transcribed) producesbeta-galactosidase and turns blue. Some systems use a second reportergene and a yeast strain that requires expression of this reporter togrow on a particular medium. These “selectable marker” genes usuallyencode enzymes required for the biosynthesis of an amino acid. Suchreporters have the marked advantage of providing an election for cDNAsthat encode interacting proteins, rather than a visual screen for blueyeast. To make appropriate reporters from the marker genes, theirupstream transcription regulatory elements were replaced by bindingsites for a DNA-binding domain. The HIS3 and LEU2 genes have both beenused as reporters in conjunction with appropriate yeast strains thatrequire their expression to grow on media lacking either histidine orleucine, respectively. Finally, different systems use different means toexpress activation-tagged cDNA proteins.

[0022] In all current schemes, the cDNA-encoded proteins are expressedwith an activation domain at the amino terminus. The activation domainsused include the strong activation domain from Gal4, the very strongactivation domain from the Herpes simplex virus protein VP16, or aweaker activation domain derived from bacteria, called B42. Theactivation-tagged cDNA-encoded proteins are expressed either from aconstitutive promoter, or from a conditional promoter such as that ofthe GAL1 gene. Use of a conditional promoter makes it possible toquickly demonstrate that activation of the reporter gene is dependent onexpression of the activation-tagged cDNA proteins.

[0023] It is clear from the foregoing that two-hybrid systems forfinding binding proteins have been used in the past. However, althoughthe conventional two hybrid system has proven to be a valuable tool infinding proteinaceous molecules that can bind to other proteins it is anartificial system. A characteristic of a two hybrid system is that afusion protein is made consisting of a part of which binding partnersare sought and a reporter part that enables detection of binding. Forfinding relevant binding partners, several criteria must be met of whichone is of course the correct choice of the region in the protein wherebinding to other proteins occurs. Another criterion which is much moredifficult if not impossible to predict accurately on forehand isobtaining correct folding of the region (i.e., a folding of the regionsufficiently similar to the folding of the region in the naturalprotein). Correct folding depends on among other things, the actualamino acid sequence chosen for generating the fusion protein. Anotherfactor determining the identification of relevant binding partners isthe sensitivity with which binding can be detected.

[0024] An alternative to the conventional two-hybrid system is alsoprovided herein. Thus, the invention provides an in vivo method and kitfor detecting interactions between proteins and the influence of othercompounds on the interaction as such, using reconstitution of theactivity of a transcriptional activator. This reconstitution makes useof two, so-called hybrid, chimeric, or fused proteins. These two fusedproteins each show, independent from one another, a weak affinitytowards a nucleic acid sequence comprising one of SEQ ID NO: 1, SEQ IDNO: 2, SEQ ID NO: 3, or SEQ ID NO: 4 wherein N, in these sequences, is aspacer sequence as previously described. However, when both fusedproteins are independently bound to the sequence, and the test proteinseach available in each of two fused proteins are as a result thereofbrought into close proximity, the binding affinity towards the nucleicacid sequence comprising one of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO:3, or SEQ ID NO: 4 wherein N, in these sequences, is a spacer sequenceas previously described, becomes much stronger. If the two test proteinsindeed are able to interact, they bring, as a consequence thereof, intoclose proximity the transcriptional activator's two domains. Thisproximity is sufficient to cause transcription, which can be detected bythe activity of a marker gene located adjacent to the nucleic acidsequence comprising one of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, orSEQ ID NO: 4 wherein N, in these sequences, is a spacer sequence aspreviously described.

[0025] In accordance herewith a method is provided for detecting aninteraction between a first interacting protein and a second interactingprotein comprising providing a suitable host cell with a first fusionprotein comprising a first interacting protein fused to a DNA bindingdomain capable to bind a nucleic acid sequence comprising one of SEQ IDNO: 1, SEQ ID NO: 2, SEQ ID NO: 3, or SEQ ID NO: 4 wherein N, in thesesequences, is a spacer sequence as previously described, providing thesuitable host cell with a second fusion protein comprising a secondinteracting protein fused to a DNA binding domain capable to bind anucleic acid sequence comprising one of SEQ ID NO: 1, SEQ ID NO: 2, SEQID NO: 3, or SEQ ID NO: 4 wherein N, in these sequences, is a spacersequence as previously described, subjecting the host cell to conditionsunder which the first interacting protein and the second interactingprotein are brought into close proximity and determining whether adetectable gene present in the host cell and located adjacent to thenucleic acid sequence has been expressed to a degree greater thanexpressed in the absence of the interaction between the first and thesecond interacting protein.

[0026] As an example, it should be clear that, in case a binding partner(prey) for a specific protein (bait) has been identified, the firstfusion protein containing the bait will for example bind to the sequenceCACCT (the first 5 nucleotides of SEQ ID NO: 1) (or AGGTG (the firstfive nucleic acids of SEQ ID NO: 3)) of the sequence CACCT-N-AGGTG and(SEQ ID NO: 2) that the second fusion protein containing the prey willbind to the sequence AGGTG, (the first five nucleic acids of SEQ ID NO:3) (or CACCT (the first 5 nucleotides of SEQ ID NO: 1), respectively) ofthe sequence CACCT-N-AGGTG (SEQ ID NO: 2) so that transcription of amarker gene will occur.

[0027] The present invention finally relates to the new sequences SEQ IDNO: 1, SEQ ID NO: 2, SEQ ID NO: 3, and SEQ ID NO: 4 wherein N, in thesesequences, is a spacer sequence as previously described, and to the useof the sequences, in addition to any other sequence at least comprisinga sequence CACCT, for the identification, via any method known by aperson skilled in the art, of new target genes different from thealready described target genes Brachyury, (α4-integrin, follistatin orE-cadherin.

BRIEF DESCRIPTION OF THE FIGURES

[0028]FIG. 1 is a schematic representation of Zfh-1, SIP1 and δEF1, andalignment of the SIP1 and δEF1 zinc fingers. (A) Schematicrepresentation of mouse δEF1 (1117 amino acids) and SIP1 (1214 aminoacids). The filled boxes represent CCHH zinc fingers, the open boxes areCCHC zinc fingers. The homeodomain-like domain (HD) is depicted as anoval. The percentage represents the homology between different domains.SIP1 polypeptides used in this study are depicted with theircoordinates. SBD: Smad-binding domain (Verschueren et al., 1999). (B)Alignments of the amino acid sequences from zinc fingers of SIP1 andδEF1. Vertical bars indicate sequence identity. The conserved cysteineand histidine residues forming the zinc fingers are printed in bold, andindicated by an asterisk. The residues in zinc fingers that can contactDNA are indicated with an arrow. (C) Alignment of the protein sequenceof SIP1_(NZF3+NZF4) and SIP1_(CZF2+CZF3), and of δEF1_(NZF3+NZF4) andδEF1_(CZF2+CZF3), respectively, demonstrating intramolecularconservation of zinc fingers.

[0029]FIG. 2 depicts possible DNA-binding mechanisms for SIP1. Model 1:SIP1 binds DNA as a monomer. Model 2: SIP1 binds DNA as a dimer.

DETAILED DESCRIPTION OF THE INVENTION

[0030] The following definitions are set forth to assist in theunderstanding of various terms used herein.

[0031] “Nucleic acid” or “nucleic acid sequence” or “nucleotidesequence” means genomic DNA, cDNA, double stranded or single strandedDNA, messenger RNA or any form of nucleic acid sequence known to one ofskill in the art.

[0032] The terms “protein” and “polypeptide” used in this applicationare interchangeable. “Polypeptide” refers to a polymer of amino acids(amino acid sequence) and does not refer to a specific length of themolecule. Thus, peptides and oligopeptides are included within thedefinition of polypeptide. Included within the definition are, forexample, polypeptides containing one or more analogs of an amino acid(e.g., unnatural amino acids, etc.), polypeptides with substitutedlinkages, as well as other modifications known in the art, bothnaturally occurring and non-naturally occurring. The proteins andpolypeptides described above are not necessarily translated from adesignated nucleic acid sequence; the polypeptides may be generated inany manner, including for example, chemical synthesis, or expression ofa recombinant expression system, or isolation from a suitable viralsystem.

[0033] The polypeptides may include one or more analogs of amino acids,phosphorylated amino acids, or unnatural amino acids. Methods ofinserting analogs of amino acids into a sequence are known in the art.The polypeptides may also include one or more labels, which are known tothose skilled in the art. In this context, it is also understood thatthe proteins may be further modified. By providing the proteins it isalso possible to determine fragments, which retain biological activity,namely, the mature, processed form. This allows the construction ofchimeric proteins and peptides comprising an amino sequence derived fromthe mature protein, which is crucial for its binding activity. The otherfunctional amino acid sequences may be either physically linked by, forexample, chemical means to the proteins or may be fused by recombinantDNA techniques well known in the art.

[0034] The term “derivative”, “functional fragment of a sequence” or“functional part of a sequence” means a truncated sequence of theoriginal reference sequence. The truncated sequence (nucleic acid orprotein) can vary widely in length; the minimum size being a sequence ofsufficient size to provide a sequence with at least a comparablefunction and/or activity of the original sequence referred to, while themaximum size is not critical. In some applications, the maximum sizeusually is not substantially greater than that required to provide thedesired activity and/or function(s) of the original sequence. Typically,the truncated amino acid sequence will range from about 5 to about 60amino acids in length. More typically, however, the sequence will be amaximum of about 50 amino acids in length, preferably a maximum of about30 amino acids. It is usually desirable to select sequences of at leastabout 10, 12 or 15 amino acids, up to a maximum of about 20 or 25 aminoacids.

[0035] The terms “gene(s)”, “polynucleotide”, “nucleic acid sequence”,“nucleotide sequence”, “DNA sequence” or “nucleic acid molecule(s)” asused herein refers to a polymeric form of nucleotides of any length,either ribonucleotides or deoxyribonucleotides. This term refers only tothe primary structure of the molecule. Thus, this term includes double-and single-stranded DNA, and RNA. It also includes known types ofmodifications, for example, methylation, “caps” substitution of one ormore of the naturally occurring nucleotides with an analog.

[0036] A “coding sequence” is a nucleotide sequence, which istranscribed into mRNA and/or translated into a polypeptide when placedunder the control of appropriate regulatory sequences. The boundaries ofthe coding sequence are determined by a translation start codon at the5′-terminus and a translation stop codon at the 3′-terminus. A codingsequence can include, but is not limited to mRNA, cDNA, recombinantnucleotide sequences or genomic DNA, while introns may be present aswell under certain circumstances.

[0037] With “transcription factor” is meant a class of proteins thatbind to a promoter or to a nearby sequence of DNA to facilitate orprevent transcription initiation.

[0038] With “promoter” is meant an oriented DNA sequence recognized bythe RNA polymerase holoenzyme to initiate transcription.

[0039] With “RNA polymerase” is meant a multi-subunit enzyme thatsynthesizes RNA complementary to the DNA template.

[0040] With “holoenzyme” is meant an active form of enzyme that consistsof multiple subunits.

[0041] The term ‘antibody’ or ‘antibodies’ refers to an antibodycharacterized as being specifically directed against a transcriptionfactor such as SIP-1 or any functional derivative thereof, with theantibodies being preferably monoclonal antibodies; or an antigen-bindingfragment thereof, of the F(ab′)₂, F(ab) or single chain Fv type, or anytype of recombinant antibody derived thereof. Monoclonal antibodies canfor instance be produced by a hybridoma liable to be formed according toclassical methods from an animal's splenic cells, particularly of amouse or rat immunized against SIP1 or any functional derivativethereof, and of cells of a myeloma cell line, and to be selected by theability of the hybridoma to produce the monoclonal antibodiesrecognizing SIP1 or any functional derivative thereof which have beeninitially used for the immunization of the animals. Monoclonalantibodies may be humanized versions of mouse monoclonal antibodies madeby means of recombinant DNA technology, departing from the mouse and/orhuman genomic DNA sequences coding for H and L chains or from cDNAclones coding for H and L chains. Alternatively, the monoclonalantibodies may be human monoclonal antibodies. Such human monoclonalantibodies are prepared, for instance, by means of human peripheralblood lymphocytes (PBL) repopulation of severe combined immunedeficiency (SCID) mice as described in International Patent ApplicationPCT/EP 99/03605 or by using transgenic non-human animals capable ofproducing human antibodies as described in U.S. Pat. No. 5,545,806, thecontents of both of which are incorporated by this reference. Also,fragments derived from these monoclonal antibodies such as Fab, F(ab)′₂and ssFv (“single chain variable fragment”), form part of the presentinvention provided that they have retained the original bindingproperties. Such fragments are commonly generated by, for instance,enzymatic digestion of the antibodies with papain, pepsin, or otherproteases. It is well known to the person skilled in the art thatmonoclonal antibodies, or fragments thereof, can be modified for varioususes. The antibodies can also be labeled with an appropriate label ofthe enzymatic, fluorescent, or radioactive type.

[0042] The terms ‘small molecules’ refer to, for example, small organicmolecules, and other drug candidates, which can be obtained, forexample, from combinatorial and natural product libraries via methodswell known in the art. Random peptide libraries consisting of allpossible combinations of amino acids attached to a solid phase supportmay be used to identify peptides that are able to bind to SIP1 or to thepromoter region bound by SIP1. The screening of peptide libraries mayhave therapeutic value in the discovery of pharmaceutical agents thatact to inhibit the biological activity of SIP1.

[0043] The terms ‘anti-sense nucleic acids’ and ‘ribozymes’ refer tomolecules that function to inhibit the translation of SIP1 mRNA.Anti-sense nucleic acids or anti-sense RNA and DNA molecules act todirectly block the translation of mRNA by binding to targeted mRNA andpreventing protein translation. Ribozymes are enzymatic RNA moleculescapable of catalyzing the specific cleavage of RNA. Ribozymes' mechanismof action involves sequence specific hybridization of the ribozymemolecule to complementary target RNA, followed by an endonucleolyticcleavage. Within the scope of the invention are engineered hammerheadmotif ribozyme molecules that specifically and efficiently catalyzeendonucleolytic cleavage of SIP1 RNA sequences. Specific ribozymecleavage sites within any potential RNA target are initially identifiedby scanning the target molecule for ribozyme cleavage sites (e.g., GUA,GUU and GUC). Once identified, short RNA sequences of between 15 and 20ribonucleotides corresponding to the region of the target genecontaining the cleavage site may be evaluated for predicted structuralfeatures such as secondary structure that may render the oligonucleotidesequence unsuitable. A candidate target's suitability may also beevaluated by testing its accessibility to hybridization withcomplementary oligonucleotides, using ribonuclease protection assays.Both anti-sense RNA and DNA molecules and ribozymes of the invention maybe prepared, for example, by any method known in the art for thesynthesis of RNA molecules. These include techniques for chemicallysynthesizing oligodeoxyribonucleotides well known in the art such as forexample solid phase phosphoramidite chemical synthesis. Alternatively,RNA molecules may be generated by in vitro and in vivo transcription ofDNA sequences encoding the antisense RNA molecule. Such DNA sequencesmay be incorporated into a wide variety of vectors that incorporatesuitable RNA polymerase promoters such as the T7 or SP6 polymerasepromoters. Alternatively, antisense cDNA constructs that synthesizeanti-sense RNA constitutively or inducibly, depending on the promoterused, can be introduced stably into cell lines.

[0044] The mentioned antibodies, small molecules, anti-sense nucleicacids, and ribozymes can be used as ‘a medicament’ to prevent and/ortreat tumor invasion and/or metastasis via inhibiting thedown-regulation of E-cadherin expression by SIP-1. Malignancy of tumorsimplies an inherent tendency of the tumor's cells to metastasize (invadethe body widely and become disseminated by subtle means) and eventuallyto kill the patient unless all the malignant cells can be eradicated.Metastasis is thus the outstanding characteristic of malignancy.Metastasis is the tendency of tumor cells to be carried from their siteof origin by way of the circulatory system and other channels, which mayeventually establish these cells in almost every tissue and organ of thebody. In contrast, the cells of a benign tumor invariably remain incontact with each other in one solid mass centered on the site oforigin. Because of the physical continuity of benign tumor cells, theymay be removed completely by surgery if the location is suitable. Butthe dissemination of malignant cells, each one individually possessing(through cell division) the ability to give rise to new masses of cells(new tumors) in new and distant sites, precludes complete eradication bya single surgical procedure in all but the earliest period of growth. Itshould be clear that the ‘medicament’ of the present invention could beused in combination with any other tumor therapy known in the art suchas irradiation, chemotherapy or surgery.

[0045] With regard to the above-mentioned small molecules, the term‘medicament’ relates to a composition comprising small molecules asdescribed above and a pharmaceutically acceptable carrier or excipient(both terms can be used interchangeably) to treat diseases as indicatedabove. Suitable carriers or excipients known to the skilled man aresaline, Ringer's solution, dextrose solution, Hank's solution, fixedoils, ethyl oleate, 5% dextrose in saline, substances that enhanceisotonicity and chemical stability, buffers and preservatives. Othersuitable carriers include any carrier that does not itself induce theproduction of antibodies harmful to the individual receiving thecomposition such as proteins, polysaccharides, polylactic acids,polyglycolic acids, polymeric amino acids and amino acid copolymers.

[0046] The ‘medicament’ may be administered by any suitable methodwithin the knowledge of the skilled man. The preferred route ofadministration is parenterally. In parental administration, themedicament of this invention will be formulated in a unit dosageinjectable form such as a solution, suspension or emulsion, inassociation with the pharmaceutically acceptable excipients as definedabove.

[0047] However, the dosage and mode of administration will depend on theindividual. Generally, the medicament is administered so that moleculeof the present invention is given at a dose between 1 μg/kg and 10mg/kg, more preferably between 10 μg/kg and 5 mg/kg, most preferablybetween 0.1 and 2 mg/kg. Preferably, it is given as a bolus dose.Continuous infusion may also be used and includes continuoussubcutaneous delivery via an osmotic minipump. If so, the medicament maybe infused at a dose between 5 and 20 μg/kg /minute, more preferablybetween 7 and 15 μg/kg /minute.

[0048] With regard to antibodies, anti-sense nucleic acids, andribozymes, a preferred mode of administration of the ‘medicament’ fortreatment is the use of gene therapy to deliver the above-mentionedmolecules. Gene therapy means the treatment by the delivery oftherapeutic nucleic acids to patient's cells. This is extensivelyreviewed in Lever and Goodfellow 1995; Br. Med Bull.,51, 1-242 (Culver1995); Ledley, F. D. Hum. Gene Ther. 6, 1129 (1995). To achieve genetherapy there must be a method of delivering genes to the patient'scells and additional methods to ensure the effective production of anytherapeutic genes. Two general approaches exist to achieve genedelivery; these are non-viral delivery and virus-mediated gene delivery.

[0049] The following examples more fully illustrate preferred featuresof the invention, but should not be construed to limit the invention inany way.

EXAMPLES

[0050] Characterization of Nucleic Acid Sequences at Least Comprising aCACCT Sequence.

[0051] SIP1 and δEF1 Bind to Target Sites Containing One CACCT Sequenceand One CACCTG Sequence

[0052] The DNA binding properties of SIP1 were studied. SIP1, a recentlyisolated Smad-interacting protein, belongs to the emerging family oftwo-handed zinc finger transcription factors (34). The organization ofSIP1 is very similar to that of δEF1, the prototype member of thisfamily. Both proteins contain two widely separated clusters of zincfingers, which are involved in binding to DNA. The amino acid sequencehomology is very high (more than 90%) within these two zinc fingerclusters, whereas it is less evident in the other regions. This findingsuggests that both proteins would bind in an analogous fashion tosimilar DNA targets. Indeed, SIP1 as well as δEF1 bind with comparableaffinities to many different target sites, which always contain twoCACCT sequences.

[0053] SIP1_(FS) inhibits Xbra2 expression when over-expressed in theXenopus embryo (34), and SIP1_(FS) binds to the Xbra2 promoter bycontacting two CACCT sequences. Recent studies using Xenopus transgenicembryos have shown that 2.1 kb of Xbra2 promoter sequences suffice toexpress a reporter protein in the same domain as Xbra itself (17).However, a single point mutation within the downstream CACCT site(Xbra-D) in the promoter that disrupts SIP1 binding (as seen in gelretardation assays) has a severe effect. Expression of the markerprotein initiates earlier (i.e., at stage 9), and is now found atectopic sites, for example, in the majority of ectodermal, mesodermal,and endodermal cells (17). This finding indicates that this nucleotide,which is located within the downstream CACCT site, is required forcorrect spatial and temporal expression of the Xbra2 gene. In addition,when a mutation is introduced in the upstream CACCT sequence, weobserved the same premature and ectopic expression of Xbra2 as for themutation within the downstream CACCT site. Therefore, mutations ineither the downstream or upstream CACCT that are known to affect SIP1 orδEF1 binding in EMSA, give the same phenotype in vivo, indicating that aXenopus δEF1-like protein participates in the regulation of the Xbra2gene. In addition, these in vivo data support the conclusions from thein vitro binding experiments presented here: SIP1 /δEF1-liketranscription factors require two CACCT sites for regulating theexpression of the Xbra2 promoter.

[0054] Not all promoter regions containing two CACCT sequences representSIP1 or δEF1 binding sites. Notably, duplication of the Xbra-F probe,which contains the upstream CACCT sequence present in the Xbra-WTelement, is refractory to binding of either SIP1 or δEF1. Moreover,neither SIP1_(NZF) nor SIP1_(CZF) can bind efficiently to this site(Xbra-F) as monomer or as dimer. Thus other sequences in addition toCACCT may be required for generating a high-affinity binding site. Itappears that CACCTG is always a better target site for binding of thesezinc finger clusters. Indeed, the high-affinity CACCTG site (Xbra-E) wasshown to bind either the SIP1_(NZF) or the SIP1_(CZF) cluster. Inaddition, modification of the CACCTG site into CACCTA strongly affectsthe binding of SIP1_(FS) and δEF1 to the Xbra promoter, confirming theimportance of this 3′-guanine residue. By comparing the sequence of allthe SIP1 and δEF1 target sites, a minimal consensus sequence was foundcomposed of one CACCT sequence and one CACCTG sequence, demonstratingthat these two sequences are sufficient to form a high-affinity bindingsite for SIP1 or δEF1.

[0055] Although the upstream CACCT sequence is unable to bind SIP1_(CZF)or SIP1_(NZF), this sequence is contacted by full size SIP1 in thecontext of the Xbra-WT probe. The upstream CACCT sequence is aprerequisite for the binding of SIP1_(FS) to the Xbra-WT probe. Thus,when the upstream CACCT sequence is combined with another, high-affinityCACCTG site (Xbra-E), this low affinity site (Xbra-F) becomes committedto the binding Of SIP1_(FS). A model in which SIP1_(FS) contacts itstarget promoter via the binding of one of its zinc fingers clusters to ahigh affinity CACCTG-sequence (e.g., Xbra-E) is favored, which isfollowed by the contact of the low affinity CACCT site (Xbra-F) by thesecond cluster, and this additional interaction strongly stabilizes SIP1binding. Therefore, a CACCT site may still have an important function inthe regulation of gene expression; while even on its own it neitherbinds SIP1_(NZF), SIP1_(CZF) nor SIP1_(FS).

[0056] The DC5 probe from the δ1-crystallin enhancer binds δEF1specifically (31). However, this probe contains only one CACCT sequence.Therefore, despite having demonstrated here that high affinity bindingsites for δEF1 should contain one CACCT sequence and one CACCTGsequence, it cannot be excluded that in particular cases, such as theDC5 probe, one CACCT site would be sufficient for the binding of thistype of transcription factor.

[0057] Mode of SIP1 DNA Binding

[0058] When tested independently in EMSA, both the C-terminal as well asthe N-terminal zinc finger clusters of SIP1 or δEF1 bind to very similarCACCT-containing consensus sequences. Both for SIP1 and δEF1, NZF3 andNZF4 share an extensive amino acid sequence homology with CZF2 and CZF3,respectively. This homology may explain why these two clusters can bindto similar consensus sequences. In addition, it has been shown that SIP1or δEF1 require two CACCT sequences for binding to several potentialtarget sites. Based on these results, it is surmised that SIP1 and δEF1would bind to their target elements in such a way that one zinc fingercluster contacts one of the CACCT sites, while the other clustercontacts the second CACCT site (see, FIG. 2, “Model 1”). An alternativemodel could be that SIP1 or δEF1 homodimerizes before being able to bindto these target sites with high affinity (“Model 2”). The DNA bindingcapacity of SIP1_(NZF) is abolished by mutations in either NZF3 or NZF4.Similarly, mutations within CZF2 or CZF3 also affect the bindingcapacity of SIP1_(CZF). When these mutations are introduced in thecontext of the full size SIP1, binding of SIP1_(FS) is no longerobserved. This observation indicates that the binding activity of bothzinc finger clusters is required for the binding of SIP1_(FS) to itstarget element, containing a doublet of CACCT sites. Similarly, it waspreviously shown that the integrity of both zinc finger clusters of δEF1is needed for binding DNA (31). These observations indicate that bothzinc fingers clusters are directly contacting the DNA. Therefore, in thedimer model (FIG. 2, Model 2), the SIP1_(NZF) of one SIP1 moleculeshould bind to one CACCT sequence and the SIP1_(CZF) of the second SIP1molecule should contact the other CACCT sequence. If such a dimerconfiguration exists, then it can be assumed that certain combinationsof full size SIP1 molecules having different mutations within CZF orNZF, respectively, should allow for the formation of a functional dimerable to bind to its target DNA. None of the possible combinations of thefour SIP1_(FS) mutants tested (NZF3mut, NZF4mut, CZF2mut and CZF3mut)gave rise to a DNA/SIP1 complex in EMSAs. This finding argues againstthe existence of SIP1 dimers. In addition, using differently taggedSIP1_(FS) molecules, detection of SIP1 dimers in EMSAs was not possible,nor to supershift such dimeric complexes with different antibodies.Therefore, support is provided for “Model 1” in which SIP1 binds as amonomer to a target site, which contains one CACCT sequence and oneCACCTG sequence.

[0059] It has been shown herein that neither the relative orientation ofthe two CACCT sequences nor the spacing between these sequences iscritical for the binding of SIP1_(FS) or δEF1. This showing demonstratesthat these transcription factors should display a highly flexiblesecondary structure to accommodate the binding to these different targetsites. The long linker region between the two zinc finger clusterswithin SIP1 and δEF1 may permit this flexibility in the secondarystructure of these proteins. These transcription factors can bind tosites containing CACCT sequences separated by at least 44 bp (Ecad-WT),suggesting that a region of about 50 bp of promoter sequences might becovered and therefore less accessible to transcriptional activators onceSIP1_(FS) or δEF1 is bound to this promoter. This indicates that SIP1 orδEF1 could function as transcriptional repressor by competing withtranscriptional activators that bind in this region covered by SIP1 orδEF1.

[0060] Other Families of Transcription Factors May Bind DNA with aSimilar Mechanism as SIP1

[0061] This new mode of DNA binding may also be generalized to othertranscription factor families, which, like SIP1 and δEF1, containseparated clusters of zinc fingers like those of the MBP/PRDII-BF1family (Refs. 1, 3, 6, 29, 33). As with SIP1 and δEF1, the conservationof these zinc finger clusters is very strong between the differentmembers of this family (1). In addition, the C-terminal cluster is veryhomologous to the N-terminal cluster and, in the case of PRDII-BF1,these clusters bind to the same sequences when tested independently (3).Therefore, this type of transcription factor may bind to two reiteratedsequences through the contact of one zinc finger cluster with onesequence and the other cluster with the second sequence. Similarly, thedifferent members of the NZF family of transcription factors also havetwo widely separated clusters of zinc fingers (Refs. 11, 12, 36). MyT1,NZF-1 and NZF-3 all bind to the same consensus element AAAGTTT (SEQ IDNO: ______ ). Like for SIP1 and δEF1, showing a significantly higheraffinity to elements containing 2 CACCT sequences, an element containing2 AAAGTTT sequences demonstrated a markedly higher affinity to NZF-3(36). This suggests that 2 AAAGTTT sequences are needed to create ahigh-affinity binding site for these transcription factors, and thatthey may bind DNA with a similar mechanism as SIP1 and δEF1. Finally,the Evi-1 protein, which contains 7 zinc fingers at the N-terminus and 3zinc fingers at the C-terminus, binds to two consensus sequences. Itbinds to a complex consensus sequence (GACAAGATAAGATAA-N₁₋₂₈-CTCATCTTC(SEQ ID NO: 6)) via a mechanism that may involve the binding of theN-terminal zinc finger cluster to the first part and the binding of theC-terminal cluster to the second part (20). In conclusion, the mode ofDNA-binding that is described here may not only be applicable to theSIP1/δEF1 family of transcription factors, but appears to be moreuniversal.

[0062] SIP1 was cloned as a Smad1-interacting protein but was also shownto interact with Smad2, 3 and 5 (34). Smad proteins are signaltransducers involved in the BMP/TGF-β signaling cascade (13). Uponbinding of TGF-β ligands to the serine/threonine kinase receptorcomplex, the receptor-regulated Smad proteins are phosphorylated by typeI receptors, and migrate to the cell nucleus where they modulatetranscription of target genes. The interaction between SIP1 and Smadshas only been observed upon ligand stimulation, indicating that Smadsneed to be activated before they are capable of interacting with SIP1(34). Surprisingly, Evi-1, a transcription factor that may bind DNA witha similar mechanism as SIP1, is a Smad3-interacting protein (15). Sofar, it was shown that Evi-1 inhibited the binding of Smad3 to DNA, butcertainly has an effect on target promoters of Evi-1. Schnurri, which isthe Drosophila homologue of the human PRDII-BF1 transcription factor, isa protein that may also bind DNA with a similar mechanism as SIP1protein. Interestingly, Schnurri was proposed to be a nuclear proteintarget in the dpp-signaling pathway (1, 6). Dpp is a member of the TGF-βfamily. This makes Schnurri a candidate nuclear target for DrosophilaMad protein, the Drosophila homologue of vertebrate Smads. Therefore,the mode of DNA binding employed by SIP1 can be generalized to otherzinc finger containing Smad-interacting proteins, and represents acommon feature of several Smad partners in the nucleus.

[0063] These results demonstrate a novel mode of DNA binding for δEF1family of transcription factors. This mode of DNA binding is alsorelevant to other families of transcription factor that containsseparated clusters of zinc fingers.

[0064] Materials and Methods

[0065] Plasmid Constructions.

[0066] For expression in mammalian cells, the SIP1 (34) and δEF1 (5)cDNAs were subcloned into pCS3 (27). In this plasmid, the SIP1 and δEF1open reading frames are fused to a (Myc)₆ tag at the N-terminus. SIP1cDNA was also cloned into pCDNA3 (Invitrogen) as an N-terminal fusionwith the FLAG tag. For the expression of SIP1_(NZF) and SIP1_(CZF), wesub-cloned into pCS3 the cDNA fragments encoding amino acids 1 to 389and 977 to 1214, respectively. SIP1_(CZF) (as amino acids 957 to 1156)and SIP1_(NZF) (amino acids 90 to 383) were also produced in E. coli asa GST fusion protein (in pGEX-5X-1, Pharmacia) and purified using theGST purification module (Pharmacia). Identical mutations to those madein AREB6 (10) were also introduced in the SIP1 zinc fingers. Mutagenesisof zinc fingers NZF3, NZF4, CZF2 and CZF3 involved substitution of theirthird His to a Ser. These mutations were introduced using a PCR basedapproach with the following primers:

[0067] SIP1_(NZF3Mut), 5′-CCACCTGAAAGAATCCCTGAGAATTCACAG (SEQ ID NO: 7);

[0068] SIP1_(NZF4Mut), 5′-GGGTCCTACAGTTCATCTATCAGCAGCAAG (SEQ ID NO: 8);

[0069] SIP1_(CZF2Mut), 5′-CACCACCTTATCGAGTCCTCGAGGCTGCAC (SEQ ID NO: 9);

[0070] SIP1_(CZF3Mut), 5′-TCCTACTCGCAGTCCATGAATCACAGGTAC (SEQ ID NO: 10.

[0071] The respective mutated clusters were re-cloned in full size SIP1in pCS3 in order to produce in mammalian cells the mutated SIP1 proteinsnamed NZF3mut, NZF4mut, CZF2mut and CZF3mut, respectively. Furthermore,these mutated clusters were sub-cloned into pGEX5-X2 (Pharmacia), andproduced in E. coli as a GST fusion protein (GST-NZF3mut, GST-NZF4mut,GST-CZF2mut and GST-CZF3mut). All constructs were confirmed byrestriction mapping and sequencing.

[0072] Cell Culture and DNA Transfection.

[0073] COS1 cells were grown in DMEM supplemented with 10% fetal bovineserum. Cells were transfected using Fugene according to themanufacturer's protocol (Boehringer Mannheim), and collected 30-48 hrsafter transfection.

[0074] Gel Retardation Assay.

[0075] The Xbra-WT oligonucleotide covers the region from −344 to −294of the Xbra2 promoter (16). The region between −412 to −352 of theα4-integrin promoter is present within the α4I-WT oligonucleotide (26).The Ecad-WT probe contains the region between −86 to −17 of the humanEcad promoter (2). The sequences of the upper strand of the wild typesand mutated double-stranded probes are listed in Table 1.Double-stranded oligonucleotides were labeled with [³²P]-γ-ATP and T4polynucleotide kinase (New England Biolabs). Total cell extracts wereprepared from COS1 cells (25) transfected with different pCS3 vectorsallowing synthesis of full length SIP1, full length δEF1, and differentmutant forms of SIP1 (25), or co-production of equal amounts ofMyc-tagged SIP1 and FLAG-tagged SIP1. GST-SIP1 fusion proteins werepurified from E. Coli extract using the GST purification module(Pharmacia), and tested in gel retardation. The DNA binding assay (20μl) was performed at 25° C., with 1 μg of COS1 total cell protein, 1 pgof poly dI-dC, 10 pg of ³²P-labeled double-stranded oligonucleotide(approx. 10⁴ Cerenkov counts) in the δEF1 binding buffer describedpreviously (30). For supershift experiments, the extracts were incubatedwith anti-Myc (Santa Cruz) or anti-FLAG (Kodak) antibodies. Forcompetition, an excess of unlabeled double-stranded oligonucleotides wasadded together with the labeled probe. The binding reaction was loadedonto a 4% polyacrylamide gel (acrylamide/bis-acrylamide, 19:1) preparedin 0.5XTBE buffer. Following electrophoresis, gels were dried, andexposed to X-Ray film. All experiments were repeated at least threetimes.

[0076] Methylation Interference Assay.

[0077] The upper and the lower strands of the Xbra-WT probe were labeledseparately and annealed with excess of complementary DNA strand. Theprobes were precipitated and treated with di-methyl-sulfate (8). Themethylated probe (10⁵ Cerenkov counts) was incubated in a 10×gelretardation reaction (see above) ( 200 μl final volume) with 10 μg oftotal cell extract from COS1 cells expressing either SIP1_(FS) orSIP1_(CZF). After 20 min. of incubation at 25° C., the products wereloaded onto a 4% polyacrylamide gel, and electrophoresis was performedas for the gel retardation assay. Subsequently, the gel was blotted ontoDEAE-cellulose membrane; the transfer was performed at 100 V for 30 min.in 0.5×TBE buffer. The membrane was then exposed for one hour, and thebands corresponding to the SIP1_(FS) (or SIP1_(CZF)) and the free probewere eluted at 65° C., using high salt conditions (1M NaCl, 20 mM Tris,pH7.5, 1 mM EDTA). The eluted DNA was precipitated and treated withpiperidine (18). After several cycles of solubilization in water andevaporation of the liquid under vacuum, the resulting DNA pellet wasdissolved in 10 μl of sequencing buffer (97.5% de-ionized formamide,0.3% each bromophenol blue and xylene cyanol, 10 mM EDTA) and denaturedfor 5 min. at 85° C. The same amount of counts (1,500 Cerenkov counts)for the free probe and the bound probe was loaded onto a 20%polyacrylamide-8M urea sequencing gel. The gel was run in 0.5×TBE forone hour at 2,000 V. Thereafter, the gel was fixed in 50% methanol/10%acetic acid and dried. The gel was then exposed for autoradiography.

[0078] Western Blot Analysis.

[0079] Transfected cells were washed with PBS-O (137 mM NaCl, 2.7 mMKCl, 6.5 mM Na₂HPO₄, 1.5 mM KH₂PO₄), collected in detachment buffer (10mM Tris pH7.5, 1 mM EDTA, 10% glycerol, with protease inhibitors(Protease inhibitor Cocktail tablets, Boehringer Mannheim)) and pelletedby low spin centrifugation. The cells were then solubilized in 10 mMTris, pH 7.4, 125 mM NaCl, 1% Triton X-100. For direct electrophoreticanalysis, gel sample buffer was added to the cell lysates and thesamples were boiled. For other experiments, lysates were first subjectedto immunoprecipitation with either anti-Myc or anti-FLAG antibodies.Antibodies were added to aliquots of the cell lysates, which wereincubated overnight at 4° C. The antibodies and the bound protein(s) ofthe cell lysate were coupled as a complex to protein A-Sepharose for 2hours at 4° C. The immunoprecipitates were washed 4 times in NET buffer(50 mM Tris pH 8.0, 150 mM NaCl, 0.1% NP40, 1 mM EDTA, 0.25% gelatin),resolved by SDS-polyacrylamide (7.5%) gel electrophoresis, andelectrophoretically transferred to nitrocellulose membranes. Membraneswere blocked for 2 hours in TBST (10 mM Tris pH 7.5, 150 mM NaCl, 0.1%TWEEN-20) containing 3% (w/v) non-fat milk, and incubated with primaryantibody (1 μg/ml) for 2 hours, followed by secondary antibody (0.5μg/ml) linked to horseradish peroxidase. Immunoreactive bands weredetected with an enhanced chemiluminescence reagent (NEN).

[0080]Xenopus laevis Transgenesis and Whole-mount in Situ Hybridization

[0081] Xenopus embryos transgenic for Xbra2-GFP were generated asdescribed previously (Kroll and Amaya, 1996), with the followingmodifications. A Drummond Nanoinject was used for injecting a fixedvolume of 5 nl of sperm nuclei suspension per egg, at a theoreticalconcentration of 2 nuclei per 5 nl. NotI was used for plasmidlinearization and nicking of sperm nuclei. Approximately 800 eggs wereinjected per egg extract incubation. The procedure resulted in asuccessful cleavage of the embryo with rates between 10% and 30%. Ofthese, 50 to 80% completed gastrulation, and 20 to 30% developed furtherinto normal swimming tadpoles, if allowed. The transgenic frequency, asanalyzed by expression, varied between 50 to 90%. Embryos were stagedaccording to Niewkoop and Faber (1967). A minimum of 30 expressingembryos were analyzed per construct and shown stage. Whole-mount in situhybridization for the GFP reporter gene was as described previously(Latinkic et al., 1997). After color detection, embryos were dehydratedand cleared in a 2:1 mixture of benzyl alcohol/ benzyl benzoate.

[0082] Table 1 lists the probes used herein. (See, also the sequencelisting, which is incorporated herein). The “Spacing” column is thenumber of nucleotides present between two CACCT sequences. In thecorresponding Table 1 of the incorporated parent PCT InternationalPatent application, the CACCT sequences are highlighted in bold. In thatTable, the underlined gaps correspond to deletions of nucleotides fromthe wild type probes. For some probes, only the residues that werechanged in comparison to the wild type probes were indicated in order tofacilitate interpretation of the introduced mutations. TABLE 1 OLIGOSEQUENCE SPACING Xbra-WT SEQ ID NO:11 24 Xbra-D SEQ ID NO:12 Xbra-E SEQID NO:13 Xbra-F SEQ ID NO:14 Rdm + Xbra-E SEQ ID NO:15 Xbra-F + AREB6SEQ ID NO:16 23 Rdm + AREB6 SEQ ID NO:17 Xbra-J SEQ ID NO:18 Xbra-K SEQID NO:19 Xbra-L SEQ ID NO:20 Xbra-M SEQ ID NO:21 Xbra-N SEQ ID NO:22Xbra-O SEQ ID NO:23 Xbra-P SEQ ID NO:24 Xbra-Q SEQ ID NO:25 Xbra-R SEQID NO:26 Xbra-S SEQ ID NO:27 Xbra-Z SEQ ID NO:28 Xbra-B SEQ ID NO:29 21Xbra-C SEQ ID NO:30 21 Xbra-U SEQ ID NO:31 14 Xbra-EE SEQ ID NO:32 18Xbra-ErE SEQ ID NO:33 20 Xbra-FrF SEQ ID NO:34 24 Xbra-V SEQ ID NO:35 24Xbra-W SEQ ID NO:36 24 α4I-WT SEQ ID NO:37 34 α4I-A SEQ ID NO:38 α4I-BSEQ ID NO:39 Ecad-WT SEQ ID NO:40 44 Ecad-A SEQ ID NO:41 Ecad-B SEQ IDNO:42

[0083] Further Materials and Methods:

[0084] Gel retardation assay with different probes from the Xbra2promoter: The different Xbra ³²P labeled probes (10 pg) were incubatedwith 1 μg of total protein extract from COS1 cells transfected withpCS3-SIP1_(CZF), with pCS3-SIP1_(FS) or from mock-transfected cells.

[0085] Two CACCT sites are contacted upon binding of SIP1_(FS) to theXbra2 promoter: Only mutations within the upstream CACCT sequence (asrevealed by scanning mutagenesis, see Table I or the downstream CACCTsequence of Xbra-WT abolish SIP1_(FS) binding. Methylation interferenceassay indicates that SIP1_(FS) contacts both CACCT sequences. Xbra-WTeither labeled in the upper or the lower strand were methylated andincubated with total extract from COS1 cells transfected either withpCS3-SIP1_(FS) or pCS3-SIP1_(CZF). The DNA retarded in the shiftedcomplex or the unbound DNA (FREE) were purified, cleaved with piperidineand run onto a sequencing gel. Guanine residues are methylated in thefree probe. The upstream and the downstream CACCT from the Xbra2promoter are indicated.

[0086] Two CACCT sequences are necessary for the binding of SIP1_(FS)and δEF1 to the Xbra2, the α4-integrin and the E-cadherin promoters:δEF1 binding to the Xbra2 promoter; SIP1 and δEF1 binding to theα4-integrin promoter.; binding of SIP1 and δEF1 to the α4-integrinpromoter, including competition with excess of non-labeled wild type andmutated binding sites; binding of SIP1 and δEF1 to the E-cadherinpromoter. In each binding reaction, 10 pg of labeled probes wereincubated with 1 μg of a total cell protein extract prepared from COS1cells transfected with either pCS3-SIP1_(FS) or pCS3-δEF1. In thecompetition experiments, 5 ng and 50 ng of unlabeled DNA were added atthe same time as the labeled probe. Myc-tag directed antibody was addedto the binding reaction and the supershifted complex. δEF1 and the SIP1retarded complexes were demonstrated. For the sequences of all probes,see Table 1 and the sequence listing.

[0087] The spacing and the relative orientation of the CACCT sequencesare not critical for the binding of SIP1_(FS) and δEF1 to the Xbra2promoter: Ten pg of labeled probes were incubated with 1 μg of a totalcell protein extract prepared from COS1 cells transfected with eitherpCS3-SIP1_(FS) or pCS3-δEF1. We used 10 pg of the Xbra-E probe and 10 pgof the Xbra-F probe in the same binding reaction. For reasons of clearand comparative presentation, we omitted the free probe from the SIP1binding reactions.

[0088] The integrity of both SIP1 zinc finger clusters is necessary forthe binding of SIP1_(FS) to DNA: Mutations within NZF3, NZF4, CZF2, CZF3abolish the DNA-binding activity of either the SIP1_(NZF) or SIP1_(CZF)zinc finger clusters. The wild type and mutated zinc finger clusterswere fused to GST and the fusion proteins were produced in E. Coli.After purification, an equal amount of each fusion proteins (0.1 ng) wasincubated with 10 pg of labeled Xbra-E probe. Mutations within NZF3,NZF4, CZF2 or CZF3 affect the binding of SIP1_(FS) to the Xbra-WT probe.Ten pg of labeled Xbra-WT probe were incubated with 1 μg of a total cellprotein extract prepared from COS1 cells transfected with eitherpCS3-SIP1_(FS), pCS3-SIP1_(NZF3mut), pCS3-SIP1_(NZF4mut),pCS3-SIP1_(CZF2mut) or pCS3-SIP1_(CZF3mut). All possible combinations of2 COS cell extracts (1 μg of each) expressing different of SIP1 mutantswere tested. Myc-tag directed antibody was added to the binding reactionand the supershifted complex and the SIP1_(FS) retarded complex areindicated. Mutations within NZF3, NZF4, CZF2 or CZF3 abolish the bindingof SIP1_(FS) to the α4-integrin promoter. Ten pg of labeled α4I-WT probewere incubated with 1 μg of a total cell protein extract prepared fromCOS1 cells transfected with either pCS3-SIP1_(FS), pCS3-SIP1_(NZF3mut),pCS3-SIP1_(NZF4mut), pCS3-SIP1_(CZF2mut) or pCS3-SIP1_(CZF3mut). Myc-tagdirected antibodies were added to the binding reaction and thesupershifted complex and the SIP1_(FS) retarded complex are indicated.SIP1 mutants are produced in comparable amounts in COS cells. Ten μg ofthe COS cell total extract were analyzed by Western blotting using theanti-Myc antibody. SIP1 mutant expression levels are in fact slightlyhigher that SIP1-WT expression level.

[0089] -SIP1_(FS) binds as a monomer to the Xbra-WT probe.

[0090] 10 pg of labeled Xbra-WT probe were incubated with 1 μg of totalcell protein prepared from COS1 cells transfected with an equal amountof pCS3-SIP1_(FS) (Myc-tagged) and of pCDNA3-SIP1 (Flag-tagged).Anti-Flag and anti-Myc antibodies were added separately or bothanti-Flag and anti-Myc antibodies were added to the binding assay. TheFlag- and the Myc-supershifted complexes are indicated.

[0091] -The integrity of CZF or NZF is necessary for SIP1 repressoractivity.

[0092] SIP1_(FS) binding to a gel-purified fragment derived from themultiple CACCT-containing artificial promoter from reporter plasmidp3TP-Lux. Anti-Myc tag antibody were added; the supershifted complex isindicated. Co-transfection assay of pCS3-SIP1_(FS), pCS3-CZF3-Mut orpCS3-NZF3-Mut together with the p3TP-Lux reporter vector is conducted.The activity is expressed in percentage of full SIP1_(FS) repressoractivity, which is 100%.

[0093] Ectopic activity of the mutated Xbra2 promoter variants(Xbra2-Mut) in transgenic frog embryos: SIP1_(FS) binding to thewild-type and mutated Xbra2 promoter elements. Whole-mount in situhybridization for GFP mRNA of Xenopus embryos transgenic for a wild-typeor point-mutated 2.1 kb Xbra2 promoter fragment driving a GFP reporter.All embryos were fixed at stage 11 and cleared for better visualizationof the signal. Percentages are indicative of intermediary phenotype(i.e., 35% of transgenic embryos displayed the normal Xbra2 expressionpattern and 65% showed ectopic expression).has a structure similar toä-EF1

[0094] SIP1 was recently isolated as a Smad-binding protein. It bindsSmad1, Smad 5 and Smad2 in a ligand-dependent fashion (in BMP andactivin pathways) (34). SIP1 is a new member of the family of two-handedzinc finger/homeodomain transcription factors, which includes vertebrateδEF1 and Drosophila Zfh-1(4, 5). Like these, SIP1 contains two widelyseparated zinc finger clusters. One cluster of four zinc fingers (3 CCHHand 1 CCHC fingers) is located at the protein's N-terminal region andanother cluster of three CCHH zinc fingers is present at the C-terminalregion (FIG. 1A). Between SIP1 and δEF1, a high degree of sequenceidentity is apparent within the N-terminal zinc finger cluster (87%),and the C-terminal zinc finger cluster (97%) (see, FIG. 1B), whereas thetwo proteins are less conserved in the regions outside the zinc fingerclusters (34). Therefore, we assumed that SIP1 and δEF1 would bind tovery similar sequences. In addition, the N-terminal and C-terminal zincfinger clusters of δEF1 bind to very similar sequences, which containthe core CACCT consensus sequence (10). Within the N-terminal cluster,both δEF1_(NZF3) and δEF1_(NZF4) are the main determinants for bindingto the CACCT consensus sequence, and δEF1_(CZF2) and δEF1_(CZF3) arerequired for the binding of the C-terminal cluster (10). Moreover, theδEF1_(NZF3+NZF4) domain shows high homology (67%) with theδEF1_(CZF2+CZF3) domain and this may explain why these two clusters bindto similar consensus target sites on DNA (FIG. 1C). All the residuesessential for binding, and which are conserved between δEF1_(NZF3+NZF4)and δEF1_(CZF2+CZF3), are also conserved between SIP1_(NZF3+NZF4) anTaken together, these comparisons suggest that the N- and C-terminalzinc finger clusters of SIP1 would also bind to very similar targetsequences.

[0095] Two CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter: CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter. CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoter CACCT sites are necessary for the binding of SIP1 to theXbra2 promoterCACCT sites are necessary for the binding of SIP1 to theXbra2 promoterCACCT sites are necessary for the binding of SIP1 to theXbra2 promoterSIP1 binds to the Xenopus Xbra2 promoter and repressesexpression of Xbra2 mRNA when overexpressed in the Xenopus embryo (34).The Xbra2 promoter contains several CACCT sequences, two of which arelocalized in a region (−381 to −231) necessary for the induction byactivin (16). These two sites, an upstream CACCT and a downstream AGGTG(i.e., 5′-CACCT on the other DNA strand) respectively, are separated by24 bp. To further elucidate the binding requirements of SIP 1 to thesesites, a corresponding 50 bp-long oligonucleotide (Xbra-WT) was used asa probe in electrophoretic mobility shift assays (EMSAs). The Xbra-Dprobe, that contains a mutation of the downstream AGGTG site to AGATG,was included also. A similar mutation was previously shown to abolishthe binding of δEF1 to the κE2 enhancer (30). In addition, we alsotested the downstream site (probe Xbra-E) and the upstream site (probeXbra-F) independently as shorter probes. These probes were incubatedwith total extracts of COS cells expressing the Myc-tagged C-terminalzinc finger cluster of SIP1 (SIP1_(CZF)), the Myc-tagged N-terminal zincfinger cluster of SIP1 (SIP1_(NZF)), or Myc-tagged full size SIP1(SIP1_(FS)).

[0096] When mock-transfected COS cells are used as control with the Aprobe, two weak complexes and one strong complex are visualized. Usingcompetitor oligonucleotides, the two weak complexes turned out to benon-specific, whereas the strong, fast migrating complex showsspecificity for binding to the Xbra probe. The latter observationsuggests that COS cells contain an endogenous protein that can bind tothe Xbra-WT probe. When SIP1_(CZF) is present in the extract, weobserved a strong and slow migrating complex, in addition to theendogenous binding activity from the COS extract. This complex could besupershifted with an anti-Myc antibody, which confirms that it resultsfrom binding of SIP1_(CZF) to the Xbra-WT probe. Mutation of thedownstream site (Xbra-D probe) strongly affected the formation of thisSIP1_(CZF) complex. Moreover, SIP1_(CZF) binds to the Xbra-E probe, butnot to the Xbra-F probe indicating that the downstream site is essentialfor binding of SIP1_(CZF), and SIP1_(CZF) may exclusively bind to thissite. The strong complex visualized with the Xbra-F probe was alsopresent in SIP1_(FS) extracts and in mock extract, and originates fromhitherto uncharacterized endogenous COS cells protein binding to theXbra-F probe. In addition, COS cell extracts containing SIP1_(NZF)displayed similar binding patterns in EMSAs as obtained with SIP1_(CZF).It is apparent that, like in δEF1 (10), both zinc finger clusters ofSIP1 have similar DNA binding features.

[0097] A strong complex, corresponding to SIP1_(FS), is also generatedwith the Xbra-WT probe. It should be noted that the SIP1_(CZF)production level in COS cells is approximately 50-fold higher than theSIP1_(FS) level. For each EMSA reaction, we used the same amount ofcrude COS cell proteins. The binding of SIP1_(FS) to Xbra-WT probe is asstrong as the binding of SIP1_(CZF). Interestingly, this indicates thatthe affinity of SIP1_(FS) for Xbra-WT is at least 50 times higher thanthis of SIP1_(CZF).

[0098] The SIP1_(FS) complex, similar to SIP1_(CZF) and SIP1_(NZF), isabsent when using the mutated Xbra-D probe. Thus, an intact downstreamsite is again required for the binding of SIP1_(FS). In contrast toSIP1_(CZF) and SIP1_(NZF), which bind with similar affinities to theXbra-WT and Xbra-E probes, SIP1_(FS) does not bind to the Xbra-E probe.Like SIP1_(CZF) and SIP1_(NZF), SIP1_(FS) does not bind to the Xbra-Fprobe. We conclude that the downstream site (AGGTG) is necessary forSIP1_(FS) to bind to the Xbra2 promoter. However, this site is notsufficient because additional sequences upstream of the Xbra-E probe arenecessary for the binding of SIP1_(FS). One of the reasons for whichSIP1_(FS) was unable to bind to the Xbra-E probe may simply be thelength of the Xbra-E probe, because it is shorter than the Xbra-WTprobe. To test this, we prepared a probe containing a random sequence(Rdm) upstream of the Xbra-E probe (Table 1) in order to extend it tothe same length as Xbra-WT. In contrast to SIP1_(CZF), which boundefficiently to Rdm+Xbra-E probe, SIP1_(FS) was unable to bind. Thisresult demonstrates that length of the Xbra-E probe per se is not thecause of the failure of SIP1_(FS) to bind to this probe.

[0099] To substantiate that the Xbra-F oligonucleotide also containssequences necessary for the binding of SIP1_(FS). We fused thisoligonucleotide as well as a random sequence upstream of another CACCTsite known to be bound strongly by AREB6 protein (Ref. 10) (probesXbra-F+AREB6 and Rdm+AREB6, respectively). SIP1_(CZF) binds, with equalaffinity, both the Xbra-F+AREB6 and Rdm+AREB6 probes indicating that theAREB6 sequence is also recognized by SIP1_(CZF). However, SIP1_(FS) onlybinds to the Xbra-F+AREB6 probe but not to Rdm+AREB6. This observationconfirms that the Xbra-F oligonucleotide contains sequences necessaryfor the binding of SIP1_(FS) . In addition, the only common featurebetween the Xbra-E and the AREB6 probe is the CAGGTGT sequence,suggesting that no other sequences than this CAGGTGT in the Xbra-E probeare necessary for the binding of SIP1_(FS).

[0100] One of the reasons why SIP1_(FS) is unable to bind to the Xbra-Eprobe might be because the length of the Xbra-E probe is shorter thanthe length of the Xbra-WT probe. To test this hypothesis, we prepared aprobe containing a random sequence upstream of the Xbra-E probe toobtain the same length as the Xbra-WT probe. In contrast to SIP1_(CZF)that binds efficiently to this probe, SIP1_(FS) was unable to bind. Thisresult shows that the Xbra-E probe's length was not the reason whySIP1_(FS) does not bind this probe. To substantiate that the Xbra-Foligonucleotide also contains sequences necessary for the binding ofSIP1_(FS), we fused that oligonucleotide and a random sequence upstreamof another CACCT site known to bind strongly AREB6 protein (Xbra-F+AREB6and Rdm+AREB6, respectively). We observed that SIP_(CZF) binds (withequal affinity) to both the Xbra-F+AREB6 and Rdm+AREB6 probes,indicating that the AREB6 sequence is also recognized by SIP1_(CZF).However, SIP1_(FS) only binds to the Xbra-F+AREB6 probe and not to theRdm+AREB6 probe. This confirms that the Xbra-F oligonucleotide containssequences necessary for the binding of SIP1_(FS). In addition, the onlycommon denominator between the Xbra-E and the AREB6 probe is the AGGTGsequence, suggesting that no other sequences than this AGGTG in theXbra-E probe is necessary for the binding of SIP1_(FS).

[0101] To map the sequences within Xbra-F that, in conjunction with theXbra-E sequence, are required for the binding of SIP1_(FS), we prepareda series of probes, identical in length to Xbra-WT, containing adjacenttriple mutations within the Xbra-F part (see, Table 1). Only three ofthese mutated probes (i.e., Xbra-L, Xbra-M and Xbra-N) affected thebinding of SIP1_(FS). Indeed, the upstream CACCT sequence, which isintact in the Xbra-F probe, was modified in the L, M and N probes. Wealso showed that SIP1_(FS) does not bind to the Xbra-S probe, whichcontains a point mutation, changing the upstream CACCT into CATCT. Thismutation is similar to the downstream AGATG mutation made within theXbra-D probe.

[0102] The results described above are indicative for SIP1_(FS)contacting both CACCT sequences in the Xbra promoter. To furtherinvestigate the importance of these sites, a DNA methylationinterference assay was carried out. The methylation of three Gs of thedownstream AGGTG (SIP_(DO)) and of the two Gs of the upstream CACCT(SIP_(UP)) was significantly lower in the SIP1_(FS) bound versus unboundprobe, suggesting that the methylation of these Gs interfered with thebinding of SIP1_(FS). This finding strongly supports that these residuesare essential for SIP1_(FS) binding. It has also been observed that themethylation of one of the 2 Gs localized very close to the SIP_(DO) alsointerfered with the binding of SIP1_(FS). Consequently it has thus beenshown that for SIP1_(FS) two CACCT sequences and their integrity arerequired for DNA binding.

[0103] SIP1 and δEF1 require 2 CACCT sequences for binding to differentpotential candidate sites SIP1 and δEF1 have a very similar structurewith two very highly conserved zinc finger clusters and it is likelythat these two proteins bind DNA in a similar way. We set out todetermine whether δEF1 also binds to the Xbra2 promoter by contactingboth CACCT sequences. Myc-tagged δEF1 was expressed in COS cells and thecorresponding nuclear extracts were tested in EMSA with WT and a panelof mutated Xbra probes. δEF1 binds strongly to the Xbra-WT probe thatcontains both CACCT sites. However, like SIP1_(FS), δEF1 binds neitherthe Xbra-E probe comprising only the downstream CACCT site nor theXbra-F probe containing only the upstream CACCT site. In addition, thepoint mutation of either the upstream CACCT (Xbra-S) or the downstreamCACCT site (Xbra-D) also abolished the binding of δEF1. Therefore, likeSIP1_(FS), full length δEF1 requires also the integrity of both CACCTsequences for binding to the Xbra2 promoter. The fact that two CACCTsites are required for the binding of SIP1_(FS) as well as δEF1 may beunique for the Xbra2 promoter. Therefore, the next question was toanalyze whether two CACCT sequences are also necessary for SIP1/δEF1 forbinding to other target sites. Putative δEF1 and SIP1 binding elementsare present in several promoters. One putative δEF1 binding element,indeed containing two intact and spaced CACCT sites, was found withinthe promoter of the human α4-integrin gene (23). Interestingly, bothsites are contained within of E2 boxes. Mutation of these two CACCTsites led to the de-repression of the α4-integrin gene expression inmyoblasts, suggesting that δEF1 is a repressor of α4-integrin genetranscription (23). Since these two CACCT sites are closely positionedin the promoter (spacing is 34 bp), we investigated whether both CACCTsequences are required for the binding of δEF1. For this purpose, a 60bp-long probe overlapping both CACCT sites of the α4-integrin promoterwas synthesized (α4I-WT) as well as two mutated versions, i.e., having apoint mutation in either the upstream (α4I-B) or the downstream CACCTsite (α4I-A), respectively (see Table 1). These probes were tested forbinding in EMSAs with COS cell extracts of either δEF1 or SIP1_(FS)transfected cells. Both δEF1 and SIP1_(FS) form strong complexes withthe α41-WT probe. The δEF1 complex was entirely supershifted with ananti-Myc antibody, demonstrating its specificity. Both the binding ofSIP1 and of δEF1 is abolished or strongly affected by a mutation ofeither the upstream or the downstream CACCT site. Moreover, competitionexperiments revealed that 50 ng of unlabeled α4I-WT probe was sufficientto abolish the binding of SIP1 or δEF1 to the α4I-WT probe, whereas 50ng of either unlabeled α4I-A or α4I-B probes were not. We concluded thatboth SIP1_(FS) and δEF1 require the integrity of two CACCT sites forbinding to the promoter of the α4-integrin gene.

[0104] We also found two closely positioned CACCT sites within thepromoter of the human E-cadherin gene. An oligonucleotide comprisingboth CACCT sites of this E-cadherin promoter was used as a probe(Ecad-WT) together with SIP1_(FS) or δEF1 extracts in EMSAs. BothSIP1_(FS) as well as δEF1 form a complex with this probe. However, wheneither the upstream (Ecad-A probe) or the downstream (Ecad-B probe)CACCT site was mutated, the binding of SIP1_(FS) and δEF1 was abolished.This finding also suggests that the two CACCT sites in this promoterrepresent a high affinity site for the binding of two-handed zincfinger/homeodomain transcription factors.

[0105] From the alignment of the Xbra-WT, α4I-WT and Ecad-WT probes (seeTable 1) we observed no obvious homology, except for one CACCTG site anda second CACCT site. Our results described herein and this alignmentindicate that only those sequences participating in the binding ofeither SIP1_(FS) or δEF1. We therefore conclude that for binding totarget promoters, SIP1_(FS) or δEF1 require at least one CACCT site andone CACCTG site.

[0106] Spacing variations and orientation of the CACCT sites: Within theXbra-WT, α4I-WT and Ecad-WT probes (Table 1), the spacing between thetwo CACCT sequences was 24, 34, and 44 bp, respectively. Since SIP1_(FS)and δEF1 bind efficiently to these probes, this demonstrates that theseproteins can accommodate spacing between the two CACCT sites rangingfrom 24 bp to at least 44 bp. To further investigate whether the spacingbetween the two CACCT sites is an important parameter for binding, wegenerated different Xbra probes with deletions between these sites. Twomutant probes (Xbra-B and Xbra-C) have a deletion of 3 adenines whereasprobe Xbra-U has a deletion of 10 nucleotides. These probes were testedin EMSA with cell extracts from COS cells expressing either SIP1_(FS) orδEF1. Both SIP1_(FS) and δEF1 bind with equal affinity to the Xbra-WT,Xbra-B, Xbra-C and Xbra-U probes. As already suggested by the resultsshown for different promoters, this indicates that also within the samepromoter element, the spacing between the two CACCT sites is not acritical parameter for the binding of these two transcription factors.

[0107] By extensive comparison of the Xbra-WT, α4I-WT and Ecad-WTprobes, we observed that in the case of the Xbra-WT and α41-WT probes,the orientation of the two CACCT sites is CACCT-N-AGGTG, whereas inEcad-WT the orientation is AGGTG-N-CACCT. Because of the non-palindromicfeature of the CACCT site, these two sites could be assumedsubstantially different. However, SIP1_(FS) and δEF1 bind to thesedifferently oriented sites with comparable affinities suggesting thatSIP1_(FS) and δEF1 can bind irrespective of the orientation of the twoCACCT sites.

[0108] To further investigate the orientation of the two CACCT siteswith respect to the DNA binding capacity of SIP1_(FS) and δEF1,additional probes were designed. Probe Xbra-EE contained a tandem repeatof the Xbra-E probe, whereas probe Xbra-ErE contained an inverted repeatof the same Xbra-E sequence. In addition, we synthesized Xbra-V, inwhich the upstream CACCT site (plus one extra base pair on each side)was replaced by the downstream AGGTG sequence and vice versa. Finally,in the Xbra-W probe, only the downstream site was replaced by theupstream CACCT sequence. All these probes were again tested in EMSAswith extracts prepared from COS cells expressing either SIP1_(FS) orδEF1. We observed the strongest binding of SIP1_(FS) or δEF1 to theXbra-EE probe. Therefore, SIP1_(FS) and δEF1 cannot bind to Xbra-E,containing a single CACCT site, but bind strongly when this sequence isduplicated, again indicating the requirement of 2 CACCT sites. Inaddition, it is evident that these two sites have to be present on thesame DNA fragment and not on two separated strands (see, below). SIP1and δEF1 bind to Xbra-ErE, also suggesting that the respectiveorientation of the two CACCT sites is not critical for binding.Furthermore, switching both the upstream and the downstream sites (probeXbra-V) or replacing only the upstream site by a second copy of thedownstream site (probe Xbra-W) did not have an effect on SIP1_(FS) andδEF1 binding. From these experiments, we conclude that neither thespacing between the two CACCT sites nor the respective orientation ofthese two sites is critical for the binding of two-handed zincfinger/homeodomain transcription factors in vitro.

[0109] Surprisingly, not all CACCT duplicated sites can bind thesefactors. In fact, duplication of the Xbra-F sequence, which incombination with the Xbra-E sequence was shown to be necessary for thebinding of SIP1_(FS) and δEF1, is refractory to binding of SIP1_(FS) andδEF1. This suggests that the CACCT site within the Xbra-F context is alow affinity site and that sequences adjacent to this CACCT site mayoptimize the affinity. In addition, the fact that neither the C-terminalcluster nor the N-terminal cluster can bind independently to the Xbra-Fprobe confirms the assumption that this site displays low affinity. Incontrast, the CACCTG site present in the Xbra-E probe can bindSIP1_(CZF) and SIP1_(NZF), and a duplication of this element creates ahigh affinity-binding site for both SIP1_(FS) and full length δEF1. Thissuggests that the terminal G base in the downstream site may also allowto discrimination between a high and low affinity-binding site. However,the CACCT site in Xbra-F may only bind one of the zinc finger clustersof SIP1_(FS) once the other cluster has occupied the neighboring highaffinity CACCTG site (in Xbra-E). To confirm the importance of theterminal G base residue for the binding of SIP1_(FS) and δEF1, wemutated the downstream CACCTG site to CACCTA (probe Xbra-Z). The bindingof SIP1_(FS) or δEF1 to the Xbra-Z probe decreased strongly (comparedwith the Xbra-WT probe) suggesting that this G-base residue is importantfor generating a high affinity-binding site for both SIP1_(FS) and δEF1.

[0110] Finally, when Xbra-E and Xbra-F probes are mixed before addingSIP1_(FS) or δEF1, no binding is observed, again indicating that bothCACCT sites have to be in the cis configuration, i.e., on the same DNA.

[0111] SIP1 and δEF1 bind to DNA elements containing two CACCT sites andboth of these proteins contain two clusters of zinc fingers capable ofbinding independently to CACCT sites. In subsequent work, we evaluatedthe importance of each zinc finger cluster for the binding of SIP1_(FS)to DNA. Mutations destroying either the third or the fourth zinc fingerof the N-terminal cluster of δEF1_(NZF) were shown to abolish thebinding of this cluster to the DNA. Similarly, mutagenesis of the secondor the third zinc finger in the C-terminal cluster also abolished thebinding of δEF1_(CZF) to CACCT (10). Therefore, we introduced in theSIP1_(NZF) and SIP1_(CZF) clusters mutations similar to those in δEF1.These mutated and wild type clusters were fused to GST and the fusionsproteins were purified from bacteria. We demonstrate that both wild typeSIP1_(NZF) and SIP1_(CZF) strongly bind to the Xbra-E probe. However,with the same amount of purified mutant cluster/GST fusion proteins(GST-NZF3, GST-NZF4, GST-CZF2 and GST-CZF3), no binding to the Xbra-Eprobe could be detected with any of these fusion proteins. Indeed, thesemutations also abolish the capacity of each cluster (SIP1_(NZF) andSIP1_(CZF)) to bind independently to a CACCT site.

[0112] We then introduced similar mutations in full size SIP1 (NZF3-Mut,NZF4-Mut, CZF2-Mut and CZF3-Mut), and over-expressed these SIP1 mutantsin COS cell as Myc-tagged proteins. The expression of the differentmutants was established and normalized by Western blot analysis usinganti-Myc antibody. By means of EMSAs, we observed that WT SIP1 bindsstrongly to the Xbra-WT probe, and that the SIP1-complex issuper-shifted upon incubation with an anti-Myc antibody. In contrast,none of the mutant forms of full size SIP1 was able to form a SIP1-likecomplex or a SIP1 super-shifted complex. The same observations were madewhen the αI4-WT probe was used as a probe. In conclusion, full size SIP1requires the binding capacities of both intact zinc fingers clusters tobind to its target, which necessarily contains 2 CACCT sites. The effectof these mutations on the repressor activity of SIP1 was tested in atransfection assay together using p3TP-Lux reporter plasmid. Thisplasmid contains three copies, each of which has one CACCT, of asequence covering the −73 to −42 region of human collagenase promoter(de Groot and Kruijer, 1990). SIP1 bound to a fragment containing thismultimerized element, but neither NZF3-Mut nor CZF3-Mut was able tobind. Over-expression of SIP1 in CHO cells leads to a strong repressionof the p3TP-Lux basal transcriptional activity. However, the repressionwas 6 to 7-fold lower upon over-expression of SIP1 mutants defective inDNA binding (NZF3-Mut or CZF3-Mut). Therefore the integrity of both zincfinger clusters is necessary for both the DNA-binding and optimal, i.e.,wild-type repressor activity of SIP1.

[0113] SIP1 binds to DNA as a monomer: The observation that theintegrity of both zinc fingers clusters is required for SIP1 binding totwo CACCT sequences, suggests that SIP1 binds as a monomer, in whicheach zinc finger cluster contacts one such site. However, it can behypothesized that SIP1 binds its target sites as a dimer implying thatone of the SIP1 molecules of the dimer would bind one CACCT site via itsN-terminal zinc finger cluster, while the second SIP1 molecule wouldcontact the DNA via its C-terminal zinc finger cluster. Since both zincfinger clusters are necessary for binding, the zinc finger cluster notinteracting with the DNA would then be involved in dimerization.Consequently, some combinations of NZF and CZF mutants should generate adimer configuration that binds DNA. In none of the combinations of NZFand CZF mutations could binding to the Xbra-WT probe be detected.Although we cannot rule out that these mutations also affect potentialdimer formation, it is highly unlikely that the same mutation affectsboth the DNA-binding capacity as well as the protein-proteininteraction. Moreover, it is highly unlikely that two different mutants(having different mutations within a cluster) would behave the same.

[0114] To address this experimentally, we used a combination ofdifferently tagged SIP1 in supershift experiments in EMSAs. First, weproduced Myc-tagged and/or FLAG-tagged SIP1_(FS) separately atcomparable levels in COS cells, and confirmed that both proteins bind toDNA with similar affinities. The SIP1 complex generated with Myc-taggedSIP1 has a slightly slower migration than the FLAG-tagged complex (theMyc-tag is longer than the FLAG-tag). Extracts prepared from COS cellsexpressing similar amounts of both Myc-tagged and FLAG-tagged SIP1 wereincubated with the Xbra-WT probe and used in EMSAs. We observed theformation of a broad SIP1 complex that is a combination of both the fastmigrating FLAG-tagged SIP1 complex with the slow migrating Myc-taggedSIP1 complex. Using an anti-FLAG antibody, only the lower part of thecomplex corresponding to FLAG-tagged SIP1 is super-shifted, whereasabout 50% of the radioactivity remains within the Myc-tagged SIP1complex. This indicates that the latter SIP1 complex is notsuper-shifted with the anti-FLAG antibody. Conversely, incubating theextract with an anti-Myc antibody super-shifted only the lower part ofthe complex corresponding to Myc-tagged SIP1 whereas 50% of theradioactivity is retained within the FLAG-tagged SIP1 complex. Again,this indicates that no FLAG-tagged SIP1 is super-shifted with ananti-Myc antibody. Using both antibodies, we observed the same twosuper-shifted bands, which correspond to the Myc-tagged and theFLAG-tagged super-shifted complex, in the upper part of the gel. If SIP1dimers would be formed, then at least some heterodimers would beassembled from Myc-tagged SIP1 and FLAG-tagged SIP1. However, wedetected no other super-shifted band corresponding to a potential doublesuper-shift, viz. super-shifted with both anti-Myc- andanti-FLAG-antibodies. Hence, this experiment gave no detectable dimerformation between FLAG-tagged SIP1 and Myc-tagged SIP1 .

[0115] Finally, FLAG-tagged SIP1 in a COS cell extract wasimmunoprecipitated in the presence of a large excess of DNA bindingsites. However, co-immunoprecipitation of Myc-tagged SIP1 was notfeasible. The reciprocal experiment, i.e., immunoprecipitating with ananti-Myc antibody and detection with an anti-FLAG antibody, did not showany SIP1 dimer either. Taken together, these observations lead us toconclude that SIP1 binds as a monomer to the Xbra-WT probe.

[0116] Mutations in either the upstream or downstream CACCT lead toectopic activity of the Xbra2 promoter in transgenic frog embryos: SIP1binds to the Xbra2 promoter and represses expression of endogenous Xbra2mRNA when overexpressed in Xenopus embryos (Verschueren et al., 1999).To analyze the importance of CACCT sequences in the regulation of theXbra2 promoter in vivo, we tested whether mutations of these wouldaffect Xbra2 promoter activity in transgenic embryos. Xbra2 promotersequences were fused upstream of the green fluorescent protein (GFP)gene and this reporter cassette was used for transgenesis. A 2.1 kb-longXbra2 promoter fragment was shown sufficient to yield the reporterprotein synthesis in the same domain of the embryo (85% of the embryos,stage 11, n=57) as compared with endogenous Xbra mRNA (which is in themarginal zone) except in the organizer region, for which a regulatoryelement may be lacking in the reporter cassette tested here.

[0117] A single point mutation within the downstream CACCT site in thepromoter, which disrupted SIP1 binding (Xbra2-Mut1) and is identical toXbraD, had a severe effect on spatial production of the reporterprotein. All embryos showed ectopic expression in the inner ectodermlayer. Mutations within the upstream CACCT sequence (Xbra2-Mut4) alsoaffected the SIP1 binding. We observed in all transgenic embryos (n>30)the same ectopic expression as for the Xbra2-Mut1 mutation. Mutation ofthe downstream CACCTG to CACCTA (Xbra2-Mut2) also affects SIP1 bindingto such probe. This mutation, when introduced into the Xbra2 2.1 kbpromoter, also led to ectopic expression of GFP mRNA in all transgenicembryos tested (n>30). We also tested a mutation (Xbra2-Mut3) thatdecreased by 3 bp the original 24 bp spacing between the two CACCTsequences. This mutation weakened the interaction of such probe withSIP1. This was also reflected in the corresponding transgene embryos(n=37): while 35% of the embryos showed the same expression pattern asthe wild type Xbra2 2.1 kb promoter fragment, 65% had either patches orweak continuous expression in the inner ectoderm layer.

[0118] A nice correlation existed between the effect of these mutationson SIP1 binding affinity in EMSA and the phenotype (ectopic expressionof the reporter gene) and its penetrance in vivo, indicating theimportance of the SIP1 target sites in the normal regulation of Xbra2expression in Xenopus development (stage 11). It also suggests that ahitherto unknown Xenopus SIP 1-like repressor regulates Xbra2 geneexpression in vivo. In addition, it confirms that SIP1-like factorsrequire two intact CACCT sites for regulating target promoters likeXbra2.

[0119] SIP1 induces invasion by down regulation of E-cadherin: SIP1binding represses E-cadherin promoter activity through binding on twoconserved E-boxes. To elucidate whether SIP1 binding affects thetranscriptional activity of the human E-cadherin promoter (−308/+41), wetransiently co-expressed full-length SIP1 with E-cadherin promoterdriven reporter constructs in the E-cadherin positive cell lines NMe(mouse), MDCK (dog) and MCF7/AZ (human). SIP1 expression led to an 80%decrease of the human E-cadherin promoter activity. To address thebinding specificity of SIP1 for the 2 conserved E-boxes, mutagenesis ineither the upstream E-box1 (−75) or downstream E-box3 (−25) orsimultaneously in both E-boxes was performed. When co-transfection wasperformed with SIP1 cDNA and the mutant E-cadherin promoter constructs(68), a de-repression of the human E-cadherin promoter activity wasconsistently shown. In addition, mutated SIP1 constructs, wereco-transfected with the human E-cadherin promoter. Mutation of theN-terminal or C-terminal zinc finger clusters resulted only in a slightderepression of the E-cadherin promoter activity. Interestingly,co-transfection of the human E-cadherin promoter and a SIP1 doublemutant, affected in both zinc finger clusters, resulted in aconsiderable loss of SIP1 mediated repression of E-cadherin promoteractivity. We can therefore conclude that SIP1 represses the E-cadherinpromoter activity by binding to the 2 E-boxes and that the 2 zinc fingerclusters are indeed needed for full repression of the E-cadherinpromoter activity.

[0120] Inducible expression of SIP1 results in dose-dependent loss ofE-cadherin protein and mRNA. To elucidate whether SIP1 affects theendogenous E-cadherin expression levels, E-cadherin positive MDCK-Tetoffcells, with high expression of the tTA transactivator was stablytransfected with a plasmid expressing a Myc₆-tagged full-length mouseSIP1 cDNA under control of a responsive tTA element. To induce SIP1,cells were grown without tetracycline for 3 days. Analysis of E-cadherinand SIP1 expression by immunofluorescence of a representative clonedtransfectant revealed induced SIP1 in the nucleus, concomitant withtotal loss of the typical honeycomb E-cadherin expression pattern atcell-cell contacts. Western blot analysis confirmed these results. SIP1induction occurred at tetracycline concentration equal or lower than 2g/ml. As the tetracycline concentration was gradually decreased,E-cadherin was more strongly repressed and this correlated inverselywith SIP1 accumulation. Further, we checked if catenins, linkingE-cadherin to the actin cytoskeleton, were influenced by SIP1expression. Upon a Western blotting, neither αE-catenin nor β-cateninappeared to be affected, and this was confirmed by immunofluorescence.Equal amounts of total RNA of both non-induced and induced cells wereanalyzed by Northern blotting. After hybridization with anE-cadherin-specific probe, the SIP1 expressing cells showed almost noE-cadherin mRNA expression, whereas the non-induced cells (+tet)expressed normal amounts of E-cadherin mRNA. These results validatethose of the reporter assays as induction of SIP1 expression affectsendogenous E-cadherin expression through mRNA down-regulation.

[0121] SIP1 expression in human carcinoma cell lines: We performedNorthern blot analyses to examine the expression of SIP1 in a panel ofE-cadherin-negative and -positive cell lines. To avoid possiblecross-hybridizations to other members of the δEF1 family, appropriatemouse and human SIP1 cDNA fragments were used as probes. We noted aclear-cut, strong inverse correlation between SIP1 expression andE-cadherin expression. High expression of SIP1 was found in humanfibroblasts and the most prevalent expression of SIP1 was found inE-cadherin-negative carcinoma cells, reported to have a methylatedE-cadherin promoter (53). As the expression level of SIP1 in thedescribed cell lines is in common with snail mRNA expression inE-cadherin negative cell lines (66), we looked for snail expressionlevels in our conditional SIP1 expressing cell line MDCK-Tetoff-SIP1.Snail expression could not be detected after SIP1 induction. E-cadherinrepression is in our cell system not snail related.

[0122] SIP1 enhances the malignant phenotype by promoting loss of cellcell adhesion and invasion. As E-cadherin is a well-knowninvasion-suppressor molecule (47), we addressed the question whetherSIP1 induction switches the cells to a more invasive phenotype. A cellaggregation assay was performed of non-induced versus inducedMDCK-Tetoff-SIP1 cells. The non-induced MDCK-Tetoff-SIP1 cells showedsignificant aggregation after 30 min, but SIP1 induction abrogatednormal cell-cell aggregation to a similar extent as an E-cadherinblocking antibody DECMA-1. Invasion into collagen type-I gels wasinduced by SIP1 as efficiently as by the DECMA-1 antibody.

[0123] SIP1-expression results in the reduction of unidirectional cellmigration. The role of E-cadherin on cell migration was demonstrated byusing a blocking E-cadherin with a specific antibody that results in areduction of unidirectional cell migration (72). The effect of SIP1expression on different cell migration due to down regulation ofE-cadherin was studied in a wound assay in the inducible MDCK-TetoffSIP1 expressing cell line. We could demonstrate that induction of SIP1results in a lower unidirectional cell migration. Down regulation ofE-cadherin mediated cell-cell contact results in the disturbance ofunidirectional migration.

[0124] DISCUSSION: Invasion and metastasis are believed to be the mostcrucial steps in tumor progression. Malignancy of carcinoma cells ischaracterized by loss of both cell-cell adhesion and cellulardifferentiation and this has been frequently reported to correlatenegatively with E-cadherin down-regulation. Loss of E-cadherinexpression has been attributed to transcriptional dysregulation (52,73). We show here that the zinc finger protein SIP 1 repressesE-cadherin expression at the transcriptional level by binding to theconserved E-boxes present in the minimal E-cadherin promoter. Thespecific binding of SIP1 on the two E-boxes was confirmed by mutagenesisof either the zinc finger clusters of SIP1 or the E-box sequences in theE-cadherin promoter. Indeed, such mutations resulted in the loss ofrepression of the E-cadherin promoter activity by SIP1. These resultsare compatible with the finding that comparable mutations of the E-boxesresulted in the up regulation of the E-cadherin promoter activity inE-cadherin-negative cell lines, where the wild type promoter shows lowactivity (Refs. 56, 58). Stable transfection of the transcriptionalrepressor SIP1 induces down regulation of E-cadherin at both mRNA andprotein level. A wound assay demonstrates that SIP1 interferes with theunidirectional migration mediated by a functional E-cadherin cell-cellcontact. Weaker cell-cell contact results in more multi-directionalmigration of the epithelial cells. A striking correlation betweendown-regulated E-cadherin and up-regulated SIP1 expression was seen invarious human tumor cells. Finally, we demonstrate here that the downregulation of E-cadherin due to SIP1 expression is also associated witha remarkable increase of the invasion capacity. Hence, SIP1 can beconsidered as an invasion-inducer due to its binding to the E-cadherinpromoter. The fact that the transciptional repressor Snail alsospecifically binds E-boxes resulting in transcriptional E-cadherinrepression (66, 67) raised the question whether the E-cadherinrepression in our studies is Snail-mediated. Snail mRNA up-regulationcould not be detected in the conditional SIP1 expressingMDCK-Tetoff-SIP1 cell line. These data led us to consider SIP1 as theeffector of transcriptional E-cadherin repression in our cell system.This idea was supported by the fact that mutations of the E-boxes have amore extensive effect on the decrease of repression of the E-cadherinpromoter when cotransfected with SIP1. Derepression of the E-cadherinpromoter activity, when cotransfected with SIP1, is already detectedwith a single E-box mutation. For Snail cotransfection a clearderepression effect was only seen when more E-boxes were mutated in thehuman E-cadherin promoter (66). The high expression of SIP1 in thebreast cancer cell lines MDA-MB435S and MDA-MB231 is remarkable. Thesetumor cell lines have been described to bear a hypermethylatedE-cadherin promoter (53). However, this should not rule out an importantrole for SIP1 repression of the endogenous E-cadherin promoter.Mutations of the E-boxes reactivate the exogenous E-cadherin promoteractivity strongly in these cell lines. Indeed, recent research madeclear that many transcription factors function by recruitingmultiprotein complexes with chromatin modifying activities to specificsites on DNA (74). It was already shown that another Smad-interactingtranscription factor TGIF associates with histone deacetylase (75). DNAmethylation and chromatin condensation could therefore actsynergistically with histone deacetylation to repress gene transcription76).

[0125] Materials and methods—Cell Culture and reagents—The MDCK-Tetoffcell line was obtained from Clonetech (Palo Alto, Calif.). This cellline is derived from the Madin Darby Canine Kidney (MDCK) Type IIepithelial cell line and stably expresses the Tet-off transactivator,tTA (77). MCF7/AZ cell line is a cell line derived from MCF7, a humanmammary carcinoma cell line (78). The NMe cell line is an E-cadherinexpressing subclone of NMuMG, an epithelial cell line from normal mousemammary gland (47). MDA-MB231 is a human breast cancer cell line (ATCC,Manassas, Va.).

[0126] Plasmids: The full-size mouse SIP1 cDNA sequence was cloned intothe Myc-tag containing pCS3 eukaryotic expressing vector derived frompCS2 (69). The resulting plasmid was designated “pCS3-SIP1FS ”. Remacleet al. (68) described mutagenesis of the zinc finger clusters of theSIP1. For the construction of the inducible vector pUHD10.3SIP1, aClaI/XbaI fragment from pCS3SIP1FS was cloned into the EcoRI/XbaI-cutpUHD10.3 vector (79). The ClaI site of SIP1 fragment and the EcoRI siteof the vector were blunted using Pfu polymerase (Stratagene; La Jolla,Calif.). The E-cadherin promoter sequence (−341/+41) was obtained by PCRon genomic DNA from the human MCF7/AZ cell line. PCR-primers used are:5′-ACAAAAGAACTCAGCCAAGTG-3′ (SEQ ID NO: 43) and 5′-CCGCAAGCTCACAGGTGC-3′(SEQ ID NO: 44). The GC-melt kit (Clontech; Palo Alto, Calif.) was usedfor efficient amplification. The PCR product was blunted, kinased andthen cloned into the pGL3basic vector (Promega; Madison, Wis.), whichwas opened at the SrfI site. By using the KpnI-HindIII sites in thisluciferase reporter construct, the E-cadherin promoter was alsotransferred to the pGL3enhancer vector. Mutagenesis of the E-boxes inthe human E-cadherin promoter was performed by the QuickChangeSite-Directed Mutagenesis Kit (Stratagene) using the following primers:

[0127] forward primer E-box1:5′-gctgtggccggCAGATGaacectcag-3′ (SEQ IDNO: 45);

[0128] reverse primer E-box1:5′-ctgagggttCATCTGccggccacagc-3′ (SEQ IDNO: 46);

[0129] forward primer E-box3:5′-gctccgggctCATCTGgctgcagc-3′ (SEQ ID NO:47);

[0130] reverse primer E-box3:5′-gctgeagcCAGATGagccccggagc-3′ (SEQ ID NO:48).

[0131] Stable transfection of cells: For stable transfection of theMDCK-Tetoff cell line, the LipofectAMINE PLUS™ (Gibco BRL, Rockville,Md.) method was used. 2000 cells were grown on a 75 cm² falcon for 24 hand then transfected with 30 μg of pUHD10.3-SIP1 plasmid plus 3 μg pPHTplasmid. The latter is a pPNT derivative and confers resistance tohygromycin (80). Stable MDCK-Tetoff transfectants, MDCK-Tetoff-SIP1,were selected by hygromycin-B (150 units/ml) (Duchefa Biochemie,Haarlem, NL) for a period of 2 weeks. Induction of SIP1 was prevented byadding tetracycline (1 μg/μl) (Sigma Chemicals, US). Expression of SIP1was done by washing away tetracycline at the time of subcloning. Stableclones with reliable induction properties were identified byimmunofluoresence using anti-Myc tag antibodies.

[0132] Promoter reporter assays: MCF7/AZ cells were transientlytransfected by using FuGENE 6 (Roche; Basel, CH). NMe and MDA-MB231 weretransfected with the LIPOFECTAMINE (Gibco BRL; Rockville, Md.) procedureand the parental MCDK cell line was transiently transfected withLIPOFECTAMINEPLUS™ (Gibco BRL; Rockville, Md.). For transienttransfection, about 200,000 cells were seeded per 10-cm² well. Afterincubation for 24 h, 600 ng of each plasmid type DNA was transfected.The medium was refreshed 24 h after transfection. Cells were lysed after3 days in GALACTO-STAR™ kit lysis solution (Tropix, Bedford, Mass.).Normalization of transfection was done by measuring β-galactosidase,encoded by the cotransfected pUT651 plasmid (Eurogentec; Seraing, BE).Luciferase substrate is added to each sample. For β-galactosidasedetection, a chemiluminescent substrate is supplied (Tropix, Bedford,Mass.). Luciferase and β-galactosidase activity was assayed in aTopcount microplate scintillation reader (Packard Instrument Co.,Meriden, Conn.).

[0133] Northern analysis: Total RNA was isolated with the RNeasy kit(Qiagen; Chatsworth, Calif. ) following the manufacturer's protocol.Total RNA (25 μg) was glyoxylated, size-fractionated on a 1% agarose geland transferred onto a Hybond-N⁺ membrane (Amersham Pharmacia Biotech,Rainhalm, UK). Hybridizations were performed as described before (81).The mouse SIP1 probe (459 bp) was generated by an EcoR-I digest of themouse SIP1 cDNA. The human SIP1 probe (707 bp) was created by a BstEII-NotI digest on the Kiaa 0569 clone (Kazusa DNA Research Institute).The mouse E-cadherin probe used was a SacI fragment (500 bp) of themouse E-cadherin cDNA. Two degenerated primers: 5′CTTCCAGCAGCCCTACGAYCARGCNCA 3′ (SEQ ID NO: 49) and 5′GGGTGTGGGACCGGATRTGCATYTTNAT 3′ (SEQ ID NO: 50) were used to amplify afragment of the dog Snail cDNA from a total cDNA population of the MDCKcell line. Cloning and sequencing of the amplified band revealed a 432bp cDNA fragment. To control the amount of loaded RNA, a GAPDH probe wasused on the same blot. We performed the quantification of theradioactive bands on a Phosphor Imager 425 (BioRad, Richmond, Calif.).

[0134] Immunofluorescense assays and Antibodies: Cells of interest weregrown on glass coverslips. Fixation was by standard procedures (82). Thefollowing antibodies were used: the rat monoclonal antibody DECMA-1(Sigma; Irvine, UK) recognizing both mouse and dog E-cadherin, and themouse anti-Myc tag antibody (Oncogene, Cambridge, Mass.). Secondaryantibodies used were Alexa 488-coupled anti-rat Ig and Alexa 594-coupledanti-mouse Ig.

[0135] Cell Aggregation Assay: Single-cell suspensions were prepared inaccordance with an E-cadherin-saving procedure (83). Cells wereincubated in an isotonic buffer containing 1.25 mM Ca²⁺under gyrotoryshaking (New Brunswick Scientific, New Brunswick, N.J.) at 80 rpm for 30min. Particle diameters were measured in a Coulter particle size counterLS200 (Coulter, Lake Placid, N.Y.) at the start (N_(o)) and after 30 minof incubation (N₃₀) and plotted against percentage volume distribution.

[0136] Collagen Invasion Assay: Six-well plates were filled with 1.25 mlof neutralized type I collagen (Upstate Biotechnology, Lake Placid,N.Y.) per well. Incubation for at least 1 h at 37° C. was needed forgelification. Single-cell suspensions were seeded on top of the collagengel and cultures were incubated at 37° C. for 24 h. Using an invertedmicroscope controlled by a computer program, we counted the invasive andsuperficial cells in 12 fields of 0.157 mm². The invasion indexexpresses the percentage of cells invading the gel over the totalnumbers of cells (84).

[0137] Wound Assay: The wound assay was performed as described before(85). Briefly, wounded monolayers were cultured for 24 h inserum-deprived medium in the presence or absence of tetracycline. Weassessed cell migration by measuring the distance of the wound.Migration results are expressed as the average of the wound-distance.

[0138] References

[0139] 1. Arora, K., H. Dai, S. G. Kazuko, J. Jamal, O. C. M B, A.Letsou, and R. Warrior. 1995. The Drosophila schnurri gene acts in theDpp/TGF beta signaling pathway and encodes a transcription factorhomologous to the human MBP family. Cell 81:781-90.

[0140] 2. Bussemakers, M. J., L. A. Giroldi, A. van Bokhoven, and J. A.Schalken. 1994. Transcriptional regulation of the human E-cadherin genein human prostate cancer cell lines: characterization of the humanE-cadherin gene promoter. Biochem Biophys Res Commun 203:1284-90.

[0141] 3. Fan, C. M., and T. Maniatis. 1990. A DNA-binding proteincontaining two widely separated zinc finger motifs that recognize thesame DNA sequence. Genes Dev 4:29-42.

[0142] 4. Fortini, M. E., Z. C. Lai, and G. M. Rubin. 1991. TheDrosophila zfh-1 and zfh-2 genes encode novel proteins containing bothzinc-finger and homeodomain motifs. Mech Dev 34:113-22.

[0143] 5. Funahashi, J., R. Sekido, K. Murai, Y. Kamachi, and H. Kondoh.1993. Delta-crystallin enhancer binding protein delta EF1 is a zincfinger- homeodomain protein implicated in postgastrulationembryogenesis. Development 119:433-46.

[0144] 6. Grieder, N. C., D. Nellen, R. Burke, K. Basler, and M.Affolter. 1995. Schnurri is required for Drosophila Dpp signaling andencodes a zinc finger protein similar to the mammalian transcriptionfactor PRDII-BF1. Cell 81:791-800.

[0145] b 7. Henderson, L. E., T. D. Copeland, R. C. Sowder, G. W.Smythers, and S. Oroszlan. 1981. Primary structure of the low molecularweight nucleic acid-binding proteins of murine leukemia viruses. J BiolChem 256:8400-6.

[0146] 8. Hendrickson, W., and R. Schleif. 1985. A dimer of AraC proteincontacts three adjacent major groove regions of the araI DNA site. ProcNatl Acad Sci USA 82:3129-33.

[0147] 9. Holmberg, S., and P. Schjerling. 1996. Cha4p of Saccharomycescerevisiae activates transcription via serine/threonine responseelements. Genetics 144:467-78.

[0148] 10. Ikeda, K., and K. Kawakami. 1995. DNA binding throughdistinct domains of zinc-finger-homeodomain protein AREB6 has differenteffects on gene transcription. Eur J Biochem 233:73-82.

[0149] 11. Jiang, Y., V. C. Yu, F. Buchholz, O. C. S, S. J. Rhodes, C.Candeloro, Y. R. Xia, A. J. Lusis, and M. G. Rosenfeld. 1996. A novelfamily of Cys-Cys, His-Cys zinc finger transcription factors expressedin developing nervous system and pituitary gland. J Biol Chem271:10723-30.

[0150] 12. Kim, J. G., and L. D. Hudson. 1992. Novel member of the zincfinger superfamily: A C2-HC finger that recognizes a glia-specific gene.Mol Cell Biol 12:5632-9.

[0151] 13. Kretzschmar, M., and J. Massague. 1998. SMADs: mediators andregulators of TGF-beta signaling. Curr Opin Genet Dev 8:103-11.

[0152] 14. Kuhnlein, R. P., G. Frommer, M. Friedrich, M.Gonzalez-Gaitan, A. Weber, J. F. Wagner-Bernholz, W. J. Gehring, H.Jackle, and R. Schuh. 1994. spalt encodes an evolutionarily conservedzinc finger protein of novel structure which provides homeotic genefunction in the head and tail region of the Drosophila embryo. Embo J13:168-79.

[0153] 15. Kurokawa, M., K. Mitani, K. Irie, T. Matsuyama, T. Takahashi,S. Chiba, Y. Yazaki, K. Matsumoto, and H. Hirai. 1998. The oncoproteinEvi-1 represses TGF-beta signalling by inhibiting Smad3. Nature394:92-6.

[0154] 16. Latinkic, B. V., M. Umbhauer, K. A. Neal, W. Lerchner, J. C.Smith, and V. Cunliffe. 1997. The Xenopus Brachyury promoter isactivated by FGF and low concentrations of activin and suppressed byhigh concentrations of activin and by paired-type homeodomain proteins[published erratum appears in Genes Dev 1998 Apr 15;12(8):1240]. GenesDev 11:3265-76.

[0155] 17. Lerchner, W., J. E. Remacle, D. Huylebroeck, and J. C. Smith.Unpublished observations.

[0156] 18. Maxam, A. M., and W. Gilbert. 1980. Sequencing end-labeledDNA with base-specific chemical cleavages. Methods Enzymol 65:499-560.

[0157] 19. Miller, J., A. D. McLachlan, and A. Klug. 1985. Repetitivezinc-binding domains in the protein transcription factor IIIA fromXenopus oocytes. Embo J 4:1609-14.

[0158] 20. Morishita, K., K. Suzukawa, T. Taki, J. N. Ihle, and J.Yokota. 1995. EVI-1 zinc finger protein works as a transcriptionalactivator via binding to a consensus sequence of GACAAGATAAGATAAN1-28CTCATCTTC. Oncogene 10:1961-7.

[0159] 21. Mount, S. M., and G. M. Rubin. 1985. Complete nucleotidesequence of the Drosophila transposable element copia: homology betweencopia and retroviral proteins. Mol Cell Biol 5:1630-8.

[0160] 22. Nucifora, G. 1997. The EVI1 gene in myeloid leukemia.Leukemia 11:2022-31.

[0161] 23. Postigo, A. A., and D. C. Dean. 1997. ZEB, a vertebratehomolog of Drosophila Zfh-1, is a negative regulator of muscledifferentiation. Embo J 16:3935-43.

[0162] 24. Rajavashisth, T. B., A. K. Taylor, A. Andalibi, K. L.Svenson, and A. J. Lusis. 1989. Identification of a zinc finger proteinthat binds to the sterol regulatory element. Science 245:640-3.

[0163] 25. Ray, D., R. Bosselut, J. Ghysdael, M. G. Mattei, A. Tavitian,and F. Moreau-Gachelin. 1992. Characterization of Spi-B, a transcriptionfactor related to the putative oncoprotein Spi-1/PU.1 . Mol Cell Biol12:4297-304.

[0164] 26. Rosen, G. D., J. L. Barks, M. F. Iademarco, R. J. Fisher, andD. C. Dean. 1994. An intricate arrangement of binding sites for the Etsfamily of transcription factors regulates activity of the alpha 4integrin gene promoter. J Biol Chem 269:15652-60.

[0165] 27. Rupp, R. A., L. Snider, and H. Weintraub. 1994. Xenopusembryos regulate the nuclear localization of XMyoD. Genes Dev 8:1311-23.

[0166] 28. Schwabe, J. W., and D. Rhodes. 1991. Beyond zinc fingers:steroid hormone receptors have a novel structural motif for DNArecognition. Trends Biochem Sci 16:291-6.

[0167] 29. Seeler, J. S., C. Muchardt, A. Suessle, and R. B. Gaynor.1994. Transcription factor PRDII-BF1 activates human immunodeficiencyvirus type 1 gene expression. J Virol 68:1002-9.

[0168] 30. Sekido, R., K. Murai, J. Funahashi, Y. Kamachi, A.Fujisawa-Sehara, Y. Nabeshima, and H. Kondoh. 1994. The delta-crystallinenhancer-binding protein delta EF1 is a repressor of E2-box-mediatedgene activation. Mol Cell Biol 14:5692-700.

[0169] 31. Sekido, R., K. Murai, Y. Kamachi, and H. Kondoh. 1997. Twomechanisms in the action of repressor deltaEF1: binding site competitionwith an activator and active repression. Genes Cells 2:771-83.

[0170] 32. Todd, R. B., and A. Andrianopoulos. 1997. Evolution of afungal regulatory gene family: the Zn(II)2Cys6 binuclear cluster DNAbinding motif. Fungal Genet Biol 21:388-405.

[0171] 33. van't Veer, L. J., P. M. Lutz, K. J. Isselbacher, and R.Bernards. 1992. Structure and expression of major histocompatibilitycomplex-binding protein 2, a 275-kDa zinc finger protein that binds toan enhancer of major histocompatibility complex class I genes. Proc NatlAcad Sci USA 89:8971-5.

[0172] 34. Verschueren, K., J. E. Remacle, C. Collart, H. Kraft, B. S.Baker, P. Tylzanowski, L. Nelles, G. Wuytens, M. T. Su, R. Bodmer, J.Smith, and D. Huylebroeck. SIP1, a novel zinc finger/homeodomainrepressor, interacts with Smad proteins and binds to 5′-CACCT sequencesin candidate target genes. J.Biol.Chem (1999).

[0173] 35. Watanabe, Y., K. Kawakami, Y. Hirayama, and K. Nagano. 1993.Transcription factors positively and negatively regulating the Na,K-ATPase alpha 1 subunit gene. J Biochem (Tokyo) 114:849-55.

[0174] 36. Yee, K. S., and V. C. Yu. 1998. Isolation andcharacterization of a novel member of the neural zinc fingerfactor/myelin transcription factor family with transcriptionalrepression activity. J Biol Chem 273:5366-74.

[0175] 37. Brent, R. and Ptashne, M. (1985). A eukaryotictranscriptional activator bearing the DNA specificity of a prokaryoticrepressor. Cell 43, 729-736.

[0176] 38. Chien, C. T., Bartel, P. L., Sternglanz, R., and Fields, S.(1991). The two-hybrid system: a method to identify and clone genes forproteins that interact with a protein of interest.Proc.Natl.Acad.Sci.U.S.A. 88, 9578-9582.

[0177] 39. Durfee, T., Becherer, K., Chen, P. L., Yeh, S. H., Yang, Y.,Kilburn, A. E., Lee, W. H., and Elledge, S. J. (1993). Theretinoblastoma protein associates with the protein phosphatase type 1catalytic subunit. Genes Dev. 7, 555-569.

[0178] 40. Gyuris, J., Golemis, E., Chertkov, H., and Brent, R. (1993).Cdi1, a human G1 and S phase protein phosphatase that associates withCdk2. Cell 75, 791-803.

[0179] 41. Silver, P. A., Brent, R., and Ptashne, M. (1986). DNA bindingis not sufficient for nuclear localisation of regulatory proteins inSaccharomyces cerevisiae. Mol.Cell Biol. 6, 4763-4766.

[0180] 42. Yocum, R. R., Hanley, S., West, R. J., and Ptashne, M.(1984). Use of lacZ fusions to delimit regulatory elements of theinducible divergent GAL1-GAL10 promoter in Saccharomyces cerevisiae.Mol. Cell Biol. 4, 1985-1998.

[0181] 43. de Groot, R. P. and Kruijer, W. (1990) Transcriptionalactivation by TGF beta 1 mediated by the dyad symmetry element (DSE) andthe TPA responsive element (TRE). Biochem. Biophys. Res. Commun., 168,1074-1081.

[0182] 44. Kroll, K .L. and Amaya, L. (1996) Transgenic Xenopus embryosfrom sperm nuclear transplantations reveal FGF signaling requirementsduring gastrulation. Development, 122, 3173-3183.

[0183] 45. Niewkoop, P. D. and Faber, J. (1967) Normal Table of Xenopuslaevis (Daudin). Amsterdam, North Holland.

[0184] 46. Frixen, U. H. et al. E-cadherin-mediated cell-cell adhesionprevents invasiveness of human carcinoma cells. Journal of Cell Biology113, 173-185 (1991).

[0185] 47. Vleminckx, K., Vakaet Jr, L., Mareel, M., Fiers, W. & vanRoy, F. Genetic manipulation of E-cadherin expression by epithelialtumour cells reveals an invasion suppressor role. Cell 66, 107-119(1991).

[0186] 48. Perl, A. K., Wilgenbus, P., Dahl, U., Semb, H. & Christofori,G. A causal role for E-cadherin in the transition from adenoma tocarcinoma. Nature (London) 392, 190-193 (1998).

[0187] 49. Potter, E., Bergwitz, C. & Brabant, G. The cadherin-cateninsystem: Implications for growth and differentiation of endocrinetissues. Endocrine Reviews 20, 207-239 (1999).

[0188] 50. Becker, K. F. et al. E-cadherin gene mutations provide cluesto diffuse type gastric carcinomas. Cancer Research 54, 3845-3852(1994).

[0189] 51. Berx, G., Nollet, F. & van Roy, F. Dysregulation of theE-cadherin/catenin complex by irreversible mutations in humancarcinomas. Cell Adhesion and Communication 6, 171-184 (1998).

[0190] 52. Brabant, G. et al. E-cadherin—a differentiation marker inthyroid malignancies. Cancer Research 53, 4987-4993 (1993).

[0191] 53. Graff, J. R. et al. E-cadherin expression is silenced by DNAhypermethylation in human breast and prostate carcinomas. CancerResearch 55, 5195-5199 (1995).

[0192] 54. Yoshiura, K. et al. Silencing of the E-cadherininvasion-suppressor gene by CpG methylation in human carcinomas.Proceedings of the National Academy of Sciences of the United States ofAmerica 92, 7416-7419 (1995).

[0193] 55. Behrens, J., Löwrick, O., Klein-Hitpass, L. & Birchmeier, W.The E-cadherin promoter: functional analysis of a G° C.-rich region andan epithelial cell-specific palindromic regulatory element. Proceedingsof the National Academy of Sciences of the United States of America 88,11495-11499 (1991).

[0194] 56. Giroldi, L. A. et al. Role of E boxes in the repression ofE-cadherin expression. Biochemical and Biophysical ResearchCommunications 241, 453-458 (1997).

[0195] 57. Hennig, G. et al. Progression of carcinoma cells isassociated with alterations in chromatin structure and factor binding atthe E-cadherin promoter in vivo. Oncogene 11, 475-484 (1995).

[0196] 58. Ji, X. D., Woodard, A. S., Rimm, D. L. & Fearon, E. R.Transcriptional defects underlie loss of E-cadherin expression in breastcancer. Cell Growth & Differentiation 8, 773-778 (1997).

[0197] 59. Hajra, K. M., Ji, X. D. & Fearon, E. R. Extinction ofE-cadherin expression in breast cancer via a dominant repression pathwayacting on proximal promoter elements. Oncogene 18, 7274-7279 (1999).

[0198] 60. Miettinen, P. J., Ebner, R., Lopez, A. R. & Derynck, R.TGF-beta induced transdifferentiation of mammary epithelial cells tomesenchymal cells: involvement of type I receptors. Journal of CellBiology 127, 2021-2036 (1994).

[0199] 61. Shiozaki, H. et al. Effect of epidermal growth factor oncadherin-mediated adhesion in a human oesophageal cancer cell line.British Journal of Cancer 71, 250-258 (1995)

[0200] 62. Reichmann, E. et al. Activation of an inducible c-FosERfusion protein causes loss of epithelial polarity and triggersepithelial-fibroblastoid cell conversion. Cell 71, 1103-1116 (1992). 63.Batsche, E., Muchardt, C., Behrens, J., Hurst, H. C. & Cremisi, C. R Band c-Myc activate expression of the E-cadherin gene in epithelial cellsthrough interaction with transcription factor AP-2. Molecular andCellular Biology 18, 1-12 (1998).

[0201] 64. Torban, E. & Goodyer, P. R. Effects of PAX2 expression in ahuman fetal kidney (HEK293) cell line. Biochimica et BiophysicaActa—Molecular Cell Research 1401, 53-62 (1998).

[0202] b 65. Spath, G. F. & Weiss, M. C. Hepatocyte nuclear factor 4provokes expression of epithelial marker genes, acting as a morphogen indedifferentiated hepatoma cells. Journal of Cell Biology 140, 935-946(1998).

[0203] 66. Batlle, E. et al. The transcription factor Snail is arepressor of E-cadherin gene expression in epithelial tumour cells.Nature Cell Biology 2, 84-89 (2000).

[0204] 67. Cano, A. et al. The transcription factor Snail controlsepithelial-mesenchymal transitions by repressing E-cadherin expression.Nature Cell Biology 2, 76-83 (2000).

[0205] 68. Remacle, J. E. et al. New mode of DNA binding of multi-zincfinger transcription factors: deltaEF1 family members bind with twohands to two target sites. EMBO Journal 18, 5073-5084 (1999).

[0206] 69. Verschueren, K. et al. SIP1, a novel zinc finger/homeodomainrepressor, interacts with Smad proteins and binds to 5′-CACCT sequencesin candidate target genes. Journal of Biological Chemistry 274,20489-20498 (1999).

[0207] 70. Derynck, R., Zhang, Y. & Feng, X. H. Smads: transcriptionalactivators of TGFbeta-responses. Cell 95, 737-740 (1998).

[0208] 71. Massague, J. TGF-beta signal transduction. Annual Review ofBiochemistry 67, 753-791 (1998)

[0209] 72. Andre, F. et al. Integrins and E-cadherin cooperate withIGF-I to induce migration of epithelial colonic cells. InternationalJournal of Cancer 83, 497-505 (1999).

[0210] 73. Hirohashi, S. Inactivation of the E-cadherin-mediated celladhesion system in human cancers. American Journal of Pathology 153,333-339 (1998).

[0211] 74. Bird, A. P. & Wolffe, A. P. Methylation-inducedrepression-Belts, braces, and chromatin. Cell 99, 451-454 (1999).

[0212] 75. Wotton, D., Lo, R. S., Lee, S. & Massague, J. A Smadtranscriptional corepressor. Cell 97, 29-39 (1999).

[0213] 76. Cameron, E. E., Bachman, K. E., Myohanen, S., Herman, J. G. &Baylin, S. B. Synergy of demethylation and histone deacetylaseinhibition in the re-expression of genes silenced in cancer. NatureGenetics 21, 103-107 (1999).

[0214] 77. Gossen, M. et al. Transcriptional activation by tetracyclinesin mammalian cells. Science (Washington D.C.) 268, 1766-1769 (1995).

[0215] 78. Bracke, M. E., Van Larebeke, N. A., Vyncke, B. M. & Mareel,M. M. Retinoic acid modulates both invasion and plasma membrane rufflingof MCF-7 human mammary carcinoma cells in vitro. British Journal ofCancer 63, 867-872 (1991).

[0216] 79. Gossen, M. & Bujard, H. Tight control of gene expression inmammalian cells by tetracycline-responsive promoters. Proceedings of theNational Academy of Sciences of the United States of America 89,5547-5551 (1992).

[0217] 80. Tybulewicz, V. L. J., Crawford, C. E., Jackson, P. K.,Bronson, R. T. & Mulligan, R. C. Neonatal lethality and lymphopenia inmice with a homozygous disruption of the c-abl proto-oncogene. Cell 65,1153-1163 (1991).

[0218] 81. Bussemakers, M. J. G., Van de Ven, W. J. M., Debruyne, F. M.J. & Schalken, J. A. Identification of High Mobility Group Protein I(Y)as potential progression marker for prostate cancer by differentialhybridization analysis. Cancer Research 51, 606-611 (1991)

[0219] 82. van Hengel, J., Vanhoenacker, P., Staes, K. & van Roy, F.Nuclear localization of the p120^(ctn) Armadillo-like catenin iscounteracted by a nuclear export signal and by E-cadherin expression.Proceedings of the National Academy of Sciences of the United States ofAmerica 96, 7980-7985 (1999).

[0220] 83. Bracke, M. E. et al. Insulin-like growth factor I activatesthe invasion suppressor function of E-cadherin in MCF-7 human mammarycarcinoma cells in vitro. British Journal of Cancer 68, 282-289 (1993).

[0221] 84. Bracke, M. E., Boterberg, T., Bruyneel, E. A. & Mareel, M. M.in Metastasis Methods and Protocols (eds. Brooks, S. & Schumacher, U.)In press (Humana Press, Totowa, 1999). 85. Andre, F. et al. Proteinkinase C-gamma and -delta are involved in insulin-like growth factorI-induced migration of colonic epithelial cells. Gastroenterology 116,64-77 (1999).

0 SEQUENCE LISTING <160> NUMBER OF SEQ ID NOS: 49 <210> SEQ ID NO 1<211> LENGTH: 5 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence<220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: Portion of bait for screening <400>SEQUENCE: 1 cacct 5 <210> SEQ ID NO 2 <211> LENGTH: 6 <212> TYPE: DNA<213> ORGANISM: Artificial Sequence <220> FEATURE: <221> NAME/KEY:misc_feature <223> OTHER INFORMATION: Description of ArtificialSequence: portion of bait for screening <400> SEQUENCE: 2 cacctg 6 <210>SEQ ID NO 3 <211> LENGTH: 5 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: portion of bait forscreening <400> SEQUENCE: 3 aggtg 5 <210> SEQ ID NO 4 <211> LENGTH: 7<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: consensus element for binding of MyT1, NZF-1 andNZF-3 <400> SEQUENCE: 4 aaagttt 7 <210> SEQ ID NO 5 <211> LENGTH: 52<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: complex consensus sequence <220> FEATURE: <221>NAME/KEY: misc_feature <222> LOCATION: (16)..(43) <223> OTHERINFORMATION: nucleotides 16-43 represent a spacer sequence wherein anyone, more, or all of nucleotides 16-43 my be present or absent <400>SEQUENCE: 5 gacaagataa gataannnnn nnnnnnnnnn nnnnnnnnnn nnnctcatct tc 52<210> SEQ ID NO 6 <211> LENGTH: 30 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: primer SIP1NZF3Mut <400> SEQUENCE: 6 ccacctgaaa gaatccctga gaattcacag 30 <210> SEQID NO 7 <211> LENGTH: 30 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: primer SIP1 CZF2Mut<400> SEQUENCE: 7 gggtcctaca gttcatctat cagcagcaag 30 <210> SEQ ID NO 8<211> LENGTH: 30 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence<220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: primer SIP1 NZF4Mut <400> SEQUENCE:8 caccacctta tcgagtcctc gaggctgcac 30 <210> SEQ ID NO 9 <211> LENGTH: 30<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: primer SIP1 CZF3Mut <400> SEQUENCE: 9 tcctactcgcagtccatgaa tcacaggtac 30 <210> SEQ ID NO 10 <211> LENGTH: 50 <212> TYPE:DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221> NAME/KEY:misc_feature <223> OTHER INFORMATION: Description of ArtificialSequence: probe Xbra-WT <400> SEQUENCE: 10 atccaggcca cctaaaatatagaatgataa agtgaccagg tgtcagttct 50 <210> SEQ ID NO 11 <211> LENGTH: 50<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe Xbra-D <400> SEQUENCE: 11 atccaggccacctaaaatat agaatgataa agtgaccaga tgtcagttct 50 <210> SEQ ID NO 12 <211>LENGTH: 23 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220>FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: probe Xbra-E <400> SEQUENCE: 12taaagtgacc aggtgtcagt tct 23 <210> SEQ ID NO 13 <211> LENGTH: 27 <212>TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe Xbra-F <400> SEQUENCE: 13 atccaggccacctaaaatat agaatga 27 <210> SEQ ID NO 14 <211> LENGTH: 50 <212> TYPE:DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221> NAME/KEY:misc_feature <223> OTHER INFORMATION: Description of ArtificialSequence: probe Rdm+ Xbra-E <400> SEQUENCE: 14 caatttagag tactgtgtacttgggagtaa agtgaccagg tgtcagttct 50 <210> SEQ ID NO 15 <211> LENGTH: 53<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe Xbra-F + AREB6 <400> SEQUENCE: 15 atccaggccacctaaaatat agaatgaggc tcagacaggt gtagaattcg gcg 53 <210> SEQ ID NO 16<211> LENGTH: 53 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence<220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: probe Rdm + AREB6 <400> SEQUENCE: 16caatttagag tactgtgtac ttgggagggc tcagacaggt gtagaattcg gcg 53 <210> SEQID NO 17 <211> LENGTH: 50 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: probe Xbra-J <400>SEQUENCE: 17 gcacaggcca cctaaaatat agaatgataa agtgaccagg tgtcagttct 50<210> SEQ ID NO 18 <211> LENGTH: 50 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Xbra-K<400> SEQUENCE: 18 atcactgcca cctaaaatat agaatgataa agtgaccaggtgtcagttct 50 <210> SEQ ID NO 19 <211> LENGTH: 50 <212> TYPE: DNA <213>ORGANISM: Artificial Sequence <220> FEATURE: <221> NAME/KEY:misc_feature <223> OTHER INFORMATION: Description of ArtificialSequence: probe Xbra-L <400> SEQUENCE: 19 atccagtaaa cctaaaatatagaatgataa agtgaccagg tgtcagttct 50 <210> SEQ ID NO 20 <211> LENGTH: 50<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe Xbra-M <400> SEQUENCE: 20 atccaggcccaataaaatat agaatgataa agtgaccagg tgtcagttct 50 <210> SEQ ID NO 21 <211>LENGTH: 50 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220>FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: probe Xbra-N <400> SEQUENCE: 21atccaggcca ccgccaatat agaatgataa agtgaccagg tgtcagttct 50 <210> SEQ IDNO 22 <211> LENGTH: 50 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: probe Xbra-O <400>SEQUENCE: 22 atccaggcca cctaaccgat agaatgataa agtgaccagg tgtcagttct 50<210> SEQ ID NO 23 <211> LENGTH: 50 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Xbra-P<400> SEQUENCE: 23 atccaggcca cctaaaatcg cgaatgataa agtgaccaggtgtcagttct 50 <210> SEQ ID NO 24 <211> LENGTH: 50 <212> TYPE: DNA <213>ORGANISM: Artificial Sequence <220> FEATURE: <221> NAME/KEY:misc_feature <223> OTHER INFORMATION: Description of ArtificialSequence: probe Xbra-Q <400> SEQUENCE: 24 atccaggcca cctaaaatatatcctgataa agtgaccagg tgtcagttct 50 <210> SEQ ID NO 25 <211> LENGTH: 50<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe Xbra-R <400> SEQUENCE: 25 atccaggccacctaaaatat agaagtctaa agtgaccagg tgtcagttct 50 <210> SEQ ID NO 26 <211>LENGTH: 50 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220>FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: probe Xbra-S <400> SEQUENCE: 26atccaggcca tctaaaatat agaatgataa agtgaccagg tgtcagttct 50 <210> SEQ IDNO 27 <211> LENGTH: 50 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: probe Xbra-Z <400>SEQUENCE: 27 atccaggcca cctaaaatat agaatgataa agtgactagg tgtcagttct 50<210> SEQ ID NO 28 <211> LENGTH: 47 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Xbra-B<400> SEQUENCE: 28 atccaggcca cctatataga atgataaagt gaccaggtgt cagttct47 <210> SEQ ID NO 29 <211> LENGTH: 47 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Xbra-C<400> SEQUENCE: 29 atccaggcca cctaaaatat agaatgatgt gaccaggtgt cagttct47 <210> SEQ ID NO 30 <211> LENGTH: 40 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Xbra-U<400> SEQUENCE: 30 atccaggcca cctaaaatat agtgaccagg tgtcagttct 40 <210>SEQ ID NO 31 <211> LENGTH: 46 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: probe Xbra-EE <400>SEQUENCE: 31 taaagtgacc aggtgtcagt tcttaaagtg accaggtgtc agttct 46 <210>SEQ ID NO 32 <211> LENGTH: 46 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: probe Xbra-ErE <400>SEQUENCE: 32 agaactgaca cctggtcact ttataaagtg accaggtgtc agttct 46 <210>SEQ ID NO 33 <211> LENGTH: 50 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: probe Xbra-FrF <400>SEQUENCE: 33 atccaggcca cctaaaatat agaatattct atattttagg tggcctggat 50<210> SEQ ID NO 34 <211> LENGTH: 50 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Xbra-V<400> SEQUENCE: 34 atccaggcag gtgtaaatat agaatgataa agtgacccacctacagttct 50 <210> SEQ ID NO 35 <211> LENGTH: 50 <212> TYPE: DNA <213>ORGANISM: Artificial Sequence <220> FEATURE: <221> NAME/KEY:misc_feature <223> OTHER INFORMATION: Description of ArtificialSequence: probe Xbra-W <400> SEQUENCE: 35 atccaggcag gtgtaaatatagaatgataa agtgaccagg tgtcagttct 50 <210> SEQ ID NO 36 <211> LENGTH: 60<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe alfa-4I-WT (alfa-4-integrin) <400> SEQUENCE:36 gcagggcaca cctggattgc attagaatga gactcactac ccagttcagg tgtgttgcgt 60<210> SEQ ID NO 37 <211> LENGTH: 60 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe alfa-4I-A(alfa-4-integrin) <400> SEQUENCE: 37 gcagggcaca cctggattgc attagaatgagactcactac ccagttcaga tgtgttgcgt 60 <210> SEQ ID NO 38 <211> LENGTH: 60<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe alfa-4I-B (alfa-4-integrin) <400> SEQUENCE:38 gcagggcaca tctggattgc attagaatga gactcactac ccagttcagg tgtgttgcgt 60<210> SEQ ID NO 39 <211> LENGTH: 70 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Ecad-WT<400> SEQUENCE: 39 tggccggcag gtgaaccctc agccaatcag cggtacggggggcggtgctc cggggctcac 60 ctggctgcag 70 <210> SEQ ID NO 40 <211> LENGTH:70 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE:<221> NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: probe Ecad-A <400> SEQUENCE: 40 tggccggcaggtgaaccctc agccaatcag cggtacgggg ggcggtgctc cggggctcat 60 ctggctgcag 70<210> SEQ ID NO 41 <211> LENGTH: 70 <212> TYPE: DNA <213> ORGANISM:Artificial Sequence <220> FEATURE: <221> NAME/KEY: misc_feature <223>OTHER INFORMATION: Description of Artificial Sequence: probe Ecad-B<400> SEQUENCE: 41 tggccggcag atgaaccctc agccaatcag cggtacggggggcggtgctc cggggctcac 60 ctggctgcag 70 <210> SEQ ID NO 42 <211> LENGTH:21 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE:<221> NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: PCR-primer for E-cadherin promoter sequence(-341/+41) <400> SEQUENCE: 42 acaaaagaac tcagccaagt g 21 <210> SEQ ID NO43 <211> LENGTH: 18 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence<220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: PCR-primer for E-cadherin promotersequence (-341/+41) <400> SEQUENCE: 43 ccgcaagctc acaggtgc 18 <210> SEQID NO 44 <211> LENGTH: 26 <212> TYPE: DNA <213> ORGANISM: ArtificialSequence <220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHERINFORMATION: Description of Artificial Sequence: forward primer E-box1<400> SEQUENCE: 44 gctgtggccg gcagatgaac cctcag 26 <210> SEQ ID NO 45<211> LENGTH: 26 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence<220> FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: reverse primer E-box1 <400>SEQUENCE: 45 ctgagggttc atctgccggc cacagc 26 <210> SEQ ID NO 46 <211>LENGTH: 24 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220>FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: forward primer E-box3 <400>SEQUENCE: 46 gctccgggct catctggctg cagc 24 <210> SEQ ID NO 47 <211>LENGTH: 25 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220>FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: reverse primer E-box3 <400>SEQUENCE: 47 gctgcagcca gatgagcccc ggagc 25 <210> SEQ ID NO 48 <211>LENGTH: 27 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220>FEATURE: <221> NAME/KEY: misc_feature <223> OTHER INFORMATION:Description of Artificial Sequence: degenerated primer <220> FEATURE:<221> NAME/KEY: misc_feature <222> LOCATION: (25) <223> OTHERINFORMATION: n is a spacer and may be any nucleotide <400> SEQUENCE: 48cttccagcag ccctacgayc argcnca 27 <210> SEQ ID NO 49 <211> LENGTH: 28<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE: <221>NAME/KEY: misc_feature <223> OTHER INFORMATION: Description ofArtificial Sequence: degenerated primer <220> FEATURE: <221> NAME/KEY:misc_feature <222> LOCATION: (26) <223> OTHER INFORMATION: n is a spacerand may be any nucleotide <400> SEQUENCE: 49 gggtgtggga ccggatrtgcatyttnat 28

What is claimed is:
 1. A process of identifying transcription factorssuch as activators and/or repressors comprising: providing cells with anucleic acid sequence at least comprising a sequence CACCT (the first 5nucleotides of SEQ ID NO: 1), preferably twice a CACCT sequence (thefirst 5 nucleotides of SEQ ID NO: 1), as bait(s) for the screening of alibrary encoding potential transcription factors and performing aspecificity test to isolate said transcription factors.
 2. A process ofidentifying transcription factors such as activators and/or repressorscomprising: providing cells with a nucleic acid sequence comprising oneof the sequences CACCT-N-CACCT (SEQ ID NO: 1), CACCT-N-AGGTG (SEQ ID NO:2), AGGTG-N-CACCT (SEQ ID NO: 3), or AGGTG-N-AGGTG (SEQ ID NO: 4) asbait wherein N is a spacer sequence.
 3. A process according to claim 1or claim 2 wherein the transcription factor comprises separated clustersof zinc fingers.
 4. A process according to claim 1, claim 2, or claim 3wherein the sequence originates from a promoter region.
 5. A processaccording to claim 4 wherein the promoter region is selected from thegroup consisting of Brachyury, a4-integrin, follistatin, and E-cadherin.6. A transcription factor produced by the process of claim 1, claim 2,claim 3, claim 4, or claim
 5. 7. A process for identifying compoundswith an interference capability towards transcription factors as definedin claim 6 by adding a sample comprising a potential compound to beidentified to a test system comprising: (i) a nucleotide sequencecomprising one of the sequences CACCT-N-CACCT (SEQ ID NO: 1),CACCT-N-AGGTG (SEQ ID NO: 2), AGGTG-N-CACCT (SEQ ID NO: 3), orAGGTG-N-AGGTG (SEQ ID NO: 4) as bait wherein N is a spacer, and (ii) aprotein capable to bind said nucleotide sequence, incubating said samplein said system for a period of time sufficient to permit interaction ofthe compound or its derivative or counterpart thereof with said protein,comparing the amount and/or activity of the protein bound to thenucleotide sequence before and after said adding and identification andoptionally isolation and/or purification of the compound.
 8. The processaccording to claim 7 wherein the protein is a Smad-interacting protein.9. The process according to claim 8, wherein said Smad-interactingprotein is SIP
 1. 10. A compound produced by the process of claim 7,claim 8, or claim
 9. 11. The compound of claim 10, wherein said compoundmodifies regulation of E-cadherin expression by SIP
 1. 12. Apharmaceutical composition to prevent tumor invasion and/or metastasis,said pharmaceutical composition comprising: the compound of claim 10 orclaim 11 in an amount to prevent tumor invasion and/or metastasis in asubject, and a pharmaceutically acceptable excipient.
 13. A test kit toperform the process of claim 7, said test kit comprising: a nucleotidesequence comprising a sequence selected from the group consisting ofCACCT-N-CACCT (SEQ ID NO: 1), CACCT-N-AGGTG (SEQ ID NO: 2),AGGTG-N-CACCT (SEQ ID NO: 3), and AGGTG-N-AGGTG (SEQ ID NO: 4) wherein Nis a spacer sequence and (ii)a protein capable of binding saidnucleotide sequence.
 14. A test kit to perform the process of claim 2,said test kit comprising: a nucleic acid sequence comprising one of thesequences CACCT-N-CACCT (SEQ ID NO: 1), CACCT-N-AGGTG (SEQ ID NO: 2),AGGTG-N-CACCT (SEQ ID NO: 3), or AGGTG-N-AGGTG (SEQ ID NO: 4), wherein Nis a spacer sequence.
 15. A method for detecting an interaction betweena first interacting protein and a second interacting protein comprising:providing a suitable host cell with a first fusion protein comprising afirst interacting protein fused to a DNA binding domain capable to binda nucleic acid sequence comprising one of the sequences CACCT-N-CACCT(SEQ ID NO: 1), CACCT-N-AGGTG (SEQ ID NO: 2), AGGTG-N-CACCT (SEQ ID NO:3), or AGGTG-N-AGGTG (SEQ ID NO: 4) wherein N is a spacer sequence,providing said suitable host cell with a second fusion proteincomprising a second interacting protein fused to a DNA binding domaincapable to bind a nucleic acid sequence comprising one of the sequencesCACCT-N-CACCT (SEQ ID NO: 1), CACCT-N-AGGTG (SEQ ID NO: 2),AGGTG-N-CACCT (SEQ ID NO: 3) or AGGTG-N-AGGTG (SEQ ID NO: 4) wherein Nis a spacer sequence, subjecting said host cell to conditions underwhich the first interacting protein and the second interacting proteinare brought into close proximity and determining whether a detectablegene present in the host cell and located adjacent to said nucleic acidsequence has been expressed to a greater degree than if expressed in theabsence of the interaction between the first and the second interactingprotein.
 16. An isolated nucleic acid sequence comprising a sequenceselected from the group consisting of CACCT-N-CACCT (SEQ ID NO: 1),CACCT-N-AGGTG (SEQ ID NO: 2), AGGTG-N-CACCT (SEQ ID NO: 3), andAGGTG-N-AGGTG (SEQ ID NO: 4) wherein N is a spacer.
 17. A method ofidentifying a new target gene, said method comprising: identifying saidnew target gene using a nucleic acid sequence, said nucleic acidsequence comprising a sequence selected from the group consisting ofCACCT (the first five nucleotides of SEQ ID NO: 1), CACCT-N-CACCT (SEQID NO: 1), CACCT-N-AGGTG (SEQ ID NO: 2), AGGTG-N-CACCT (SEQ ID NO: 3),and AGGTG-N-AGGTG (SEQ ID NO: 4) wherein N is a spacer.